Open ummagumm-a opened 3 weeks ago
+1 same problem, need solution
Hey! feel free to open a PR, this should only be activated when training. I am not certain, but we try to avoid handling too many things as it bloats the codebase. Also in general when you train you have a lot more control on what you want to achieve with the position ids.
It's the first time I have seen a report about that TBH 🤗
Feature request
The LLaMA 3 implementation should generate default
position_ids
that take theattention_mask
into account.@ArthurZucker @younesbelkada
Motivation
Is there a specific reason why the default
position_ids
generation doesn’t consider theattention_mask
? A friend mentioned that this issue has persisted for almost half a year now.https://github.com/huggingface/transformers/blob/adb91179b9e867b7278e0130c87558974056c7b4/src/transformers/models/llama/modeling_llama.py#L962
The problem arises when using left padding, as the default position_ids start from the first index in the sequence length rather than from the first non-zero index in the
attention_mask
.As far as I know, this is handled correctly in the
generate
function, but I would expect consistency during training as well. https://discuss.huggingface.co/t/llama-position-ids/75870Your contribution
I can submit a PR changing
https://github.com/huggingface/transformers/blob/adb91179b9e867b7278e0130c87558974056c7b4/src/transformers/models/llama/modeling_llama.py#L963
to
if noone has any objections on correctness of that. Otherwise, let's discuss.