Open elichen3051 opened 1 week ago
Hello @elichen3051 the task is the same whether one uses packing or not (i.e. next token prediction). The DataCollatorForCompletionOnlyLM
is for the special case where you want to mask the inputs / prompts and in some cases gives a small performance boost
Dear HuggingFace
I've noted that in run_cpt.py and run_sft.py, we introduce
packing=True
. However, we didn't provideDataCollatorForCompletionOnlyLM
into SFTtrainer; would it introduce cross contamination in training?referenece article: Improving Hugging Face Training Efficiency Through Packing with Flash Attention trl issue on github: https://github.com/huggingface/trl/issues/805