Open huchanwei123 opened 4 weeks ago
Cc @gante
You can directly modify how position_ids are computed within your code before passing them to the model. Ensure that your custom position_ids are aligned with the expected shape and values.
You can directly modify how position_ids are computed within your code before passing them to the model. Ensure that your custom position_ids are aligned with the expected shape and values.
Thanks for the reply. Yes, I am aware that I can pass a customized position_ids to the model, and I believe the shape and values are correct.
Since the model generates token-by-token, feeding a customized position_ids causes an size mismatch error after the first token is generated.
After a little bit digging, I found that in the function prepare_inputs_for_generation
, there is no implementation handling when position_ids is not None
. As a result, during the generation of the second token, the shape of position_ids
is still the same, but it is expected to match the shape of attention_mask
.
I am not sure if I missed or misunderstood anything. Thanks!
It has a tracker here (https://github.com/huggingface/transformers/issues/29149) and we had a PR a few months ago. Unfortunately the PR was too big and needed to be decomposed into parts, after which it went low in priority :(
System Info
Hello,
I am trying to feed a customized position IDs to Llama model. If I fed a customized position_ids vector, for example, [[0, 0, 1, 2, 2, 2]] means batch size = 1, 1st and 2nd tokens share the same position, and 3rd-5th tokens share the same position 2, this will cause an error.
The error seems to be located in the function
prepare_inputs_for_generation
insrc/transformers/models/llama/modeling_llama.py
, where theposition_ids
does not change as thecache_position
increase, so the shape inconsistency occurs.Is there any way to successfully feed a customized position ids to the model? Thanks!
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
position_ids
as one of the input inmodel.generate()
Expected behavior
Size mismatch