Closed seungeunrho closed 2 years ago
I don't think a new TrainingArgument is the right answer here. Some models shift the labels internally, I think it's all the models for causal LM (not jsut GPT-2), so I think instead of a flag, there should be a check when the loss is computed by the Trainer
for label smoothing to see if the model class name is inside the MODEL_FOR_CAUSAL_LM_MAPPING_NAMES
(to import from the auto module) and then shift the labels.
Let me know if you'd like to proceed with a PR for this fix!
Thanks for quick reply. Your approach seems plausible and I'd like to proceed it. I've read the document for contribution guide thoroughly. Can I just start now? or is there anything I should know before begin?
You can start, good luck! :-)
Feature request
Add an argument to determine shifting the
labels
or not.In TrainingArguments class, an argument named
shift_labels
should be added.During training, at here and here,
model
must check bothlabels is not None
andself.shift_labels is True
e.g.
if labels is not None and self.shift_labels: # changed # Shift so that tokens < n predict n shift_logits = lm_logits[..., :-1, :].contiguous() shift_labels = labels[..., 1:].contiguous()
Default values for
shift_labels
isFalse
, except for causal language models such asGPT2PreTrainedModel
Related to gpt2 : @patil-suraj and trainer @sgugger
Motivation
In the current state of the code, the shifting of
labels
for training GPT2LMHeadModel is changing under the use oflabel_smoothing
, which I assume is unintended.Specifically, training a GPT2LMHeadModel with
args.label_smoothing_factor==0
(which is default), the code shifts thelabels
and computes the loss inside themodel.forward()
. This assumes thatlabels
have not been shifted to be properly aligned with correspondinginput_ids
.However, if I train GPT2LMHeadModel with
args.label_smoothing_factor > 0
, then the loss is computed here, inside thecompute_loss()
function of theTrainer
. This part assumeslabels
are already shifted, and does not proceed to shift the labels.I believe whether to shift
labels
or not should be explicitly determined by its own argument, not by another argument likelabel_smoothing_factor
. In my case, our team was very frustrated that our training results were totally different by only changing thelabel_smoothing
with same givenlabels
andinput_ids
. The reason was due to the misalignment oflabels
andinput_ids
when turning on thelabel_smoothing
.Your contribution
I'm willing to make PR after your confirmation.
I want to know more about how the prediction text looks like under the label-smoothing case before the bug-fix. Does the model learn an indentity transformation and always predict the last input token repeatedly? I am curious about this.
@sgugger May I ask how to shift labels for the custom CausalLM model? Let's assume I made a CustomCausalLM model, which is not mapped in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES. As other models, my CausalLM model (which is modeled based on modeling_llama.py) shifts labels in forward function. When I use hf Trainer, my CaausalLM, and label smoother, how to shift labels in this case?
The current Trainer code is
if labels is not None:
if unwrap_model(model)._get_name() in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES.values():
loss = self.label_smoother(outputs, labels, shift_labels=True)
else:
loss = self.label_smoother(outputs, labels)
I think the workaround is to add shift_labels=True in else statement. Is there a correct, or better way to shift labels for my custom causal lm model when using label smoother??
By default, if you copy the end of the forward pass of the LlamaForConditionalGeneration
you will see that there is a part where the labels are shifted!
Feature request
Add an argument to determine shifting the
labels
or not.In TrainingArguments class, an argument named
shift_labels
should be added.During training, at here and here,
model
must check bothlabels is not None
andself.shift_labels is True
e.g.
Default values for
shift_labels
isFalse
, except for causal language models such asGPT2PreTrainedModel
Related to gpt2 : @patil-suraj and trainer @sgugger
Motivation
In the current state of the code, the shifting of
labels
for training GPT2LMHeadModel is changing under the use oflabel_smoothing
, which I assume is unintended.Specifically, training a GPT2LMHeadModel with
args.label_smoothing_factor==0
(which is default), the code shifts thelabels
and computes the loss inside themodel.forward()
. This assumes thatlabels
have not been shifted to be properly aligned with correspondinginput_ids
.However, if I train GPT2LMHeadModel with
args.label_smoothing_factor > 0
, then the loss is computed here, inside thecompute_loss()
function of theTrainer
. This part assumeslabels
are already shifted, and does not proceed to shift the labels.I believe whether to shift
labels
or not should be explicitly determined by its own argument, not by another argument likelabel_smoothing_factor
. In my case, our team was very frustrated that our training results were totally different by only changing thelabel_smoothing
with same givenlabels
andinput_ids
. The reason was due to the misalignment oflabels
andinput_ids
when turning on thelabel_smoothing
.Your contribution
I'm willing to make PR after your confirmation.