Open keunwoochoi opened 3 months ago
Hi,
Sure, feel free to open a PR. Usually, users are expected to make labels
a copy of the input_ids
, with padding tokens (or other tokens which the model can ignore) replaced by -100. See the example notebook or script here for that.
Feel free to open a PR to clarify this in the docs.
Feature request
i believe
labels
in the training of causal LMs means the value to predict at timen
, i.e., the next token. in other words, i'd assume, iflabels
is given, it should be already shifted by one in the data loader w.r.t. theinput_ids
.however, in
LlamaForCausalLM.forward()
, i found the labels are always shifted, silently.https://github.com/huggingface/transformers/blob/f1d822ba337499d429f832855622b97d90ac1406/src/transformers/models/llama/modeling_llama.py#L1205-L1210
...
i found it quite unexpected hence calling it "silently". as this is for a causal LM, shouldn't it be not shifting the labels by default? in modeling GPT2, this is at least documented explicitly.
https://github.com/huggingface/transformers/blob/f1d822ba337499d429f832855622b97d90ac1406/src/transformers/models/gpt2/modeling_gpt2.py#L1309-1314
in gemma2, it has the same behavior and no explicit mentioning in the docstring.
https://github.com/huggingface/transformers/blob/f1d822ba337499d429f832855622b97d90ac1406/src/transformers/models/gemma2/modeling_gemma2.py#L978-L982
i think at least we should force the docstring to mention this, if making a change is too dangerous at this point.
Motivation
i didn't expect this behavior and used my data loader, which does the shifting already, as i believe that is what
labels
should mean. as a result, i ended up finetuning a model to predict the next next token, which outputted gibberish.Your contribution