huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
8.61k stars 1.06k forks source link

DataCollatorForCompletionOnlyLM does not work with FSDP #1756

Open aabhasgupta opened 1 week ago

aabhasgupta commented 1 week ago

fsdp_qlora.txt The loss is returned as NaN when using DataCollatorForCompletionOnlyLM with the FSDP pipeline (attached for reference) _"could not find instruction key [882] in the following instance: <|start_header_id|>user<|end_headerid|> " Upon checking the collator function in a separate jupyter notebook, I dont see this error. Is this something to do with the distributed learning approach? I followed the FSDP approach as mentioned here: https://www.philschmid.de/fsdp-qlora-llama3 Can anyone suggest what is it that I am missing here?