huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.28k stars 26.85k forks source link

Customized position_ids not working #33938

Open huchanwei123 opened 4 weeks ago

huchanwei123 commented 4 weeks ago

System Info

Hello,

I am trying to feed a customized position IDs to Llama model. If I fed a customized position_ids vector, for example, [[0, 0, 1, 2, 2, 2]] means batch size = 1, 1st and 2nd tokens share the same position, and 3rd-5th tokens share the same position 2, this will cause an error.

The error seems to be located in the function prepare_inputs_for_generation in src/transformers/models/llama/modeling_llama.py, where the position_ids does not change as the cache_position increase, so the shape inconsistency occurs.

Is there any way to successfully feed a customized position ids to the model? Thanks!

Who can help?

No response

Information

Tasks

Reproduction

  1. Give a customized position_ids as one of the input in model.generate()

Expected behavior

Size mismatch

ArthurZucker commented 4 weeks ago

Cc @gante

codeslayed commented 3 weeks ago

You can directly modify how position_ids are computed within your code before passing them to the model. Ensure that your custom position_ids are aligned with the expected shape and values.

huchanwei123 commented 3 weeks ago

You can directly modify how position_ids are computed within your code before passing them to the model. Ensure that your custom position_ids are aligned with the expected shape and values.

Thanks for the reply. Yes, I am aware that I can pass a customized position_ids to the model, and I believe the shape and values are correct.

Since the model generates token-by-token, feeding a customized position_ids causes an size mismatch error after the first token is generated. After a little bit digging, I found that in the function prepare_inputs_for_generation, there is no implementation handling when position_ids is not None. As a result, during the generation of the second token, the shape of position_ids is still the same, but it is expected to match the shape of attention_mask.

I am not sure if I missed or misunderstood anything. Thanks!

zucchini-nlp commented 3 weeks ago

It has a tracker here (https://github.com/huggingface/transformers/issues/29149) and we had a PR a few months ago. Unfortunately the PR was too big and needed to be decomposed into parts, after which it went low in priority :(