Closed cheald closed 1 year ago
Sorry, I failed to autocomplete @gante 's handle on the inital ticket. Adding a comment for the tag.
Hey @cheald 👋
For context, position_ids
is required for correct behavior with left-padding, which in turn is needed for batched generation. Having a look at the issue!
Yup. I don't have the context to grok the proper place to be creating and passing them, but it seems like an interface error, at the minimum, to make a parameter optional and then use it non-optionally.
@cheald The issue stems from the GPTQ-for-Llama
package, which should catch all intermediary inputs for proper quantization. I've opened an issue there. You can follow it and make the corresponding local changes, which should work 🤗
However, the ball is on their side -- the changes we made are retrocompatible with our public API and, while we avoid creating these sort of issues, we have no bandwidth to fix problems regarding the use of internal variables/methods.
Is there anything else I can help you with? :)
All good. I'd suggest that an interface change to LlamaAttention to remove the None
default value for position_ids
would be appropriate, making the parameter required; it seems like a bit of a landmine to have a nominally optional argument which causes an exception if it's not provided (or, perhaps, at least an explicit check and exception if they're missing).
If the answer is "no, for the purposes of API compatibility", then that's fine, but at least then this ticket might help the next person to run into it!
Thanks so much - I realize this is cut-myself-on-the-bleeding-edge stuff, but I appreciate the swift help!
@cheald Due to Llama's popularity, I've made an exception -- this PR should make it retrocompatible. Would you be able to test it on your end? 🤗
I'll test it in a bit. Thank you so much (for this, and for all the amazing work you do on the transformers project!)
My quantization pass is still running (it takes quite some time), but it appears this is working as intended. Thank you! :tada:
@cheald hehe it turns out it is no longer needed, as the maintainers of GPTQ-for-Llama
have pushed a fix on their end!
System Info
transformers
version: 4.28.0.dev0Who can help?
@gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When trying to convert llama weights with https://github.com/qwopqwop200/GPTQ-for-LLaMa I encountered the following:
This appears to be due to a recent change in 7dcd8703ef904adc3ac19b47f769879221c33849 - LlamaAttention passes position_ids to apply_rotary_pos_emb, but defaults them to
None
and does not generate them if missing (unlike LlamaModel, which appears to generate them).Expected behavior
None
position_ids should not be passed toapply_rotary_pos_emb
.I'm not quite sure of what the right fix here is, but at a minimum, I suspect that if the caller is expected to provide them, defaulting to
None
is incorrect.