fsdp_qlora.txt
The loss is returned as NaN when using DataCollatorForCompletionOnlyLM with the FSDP pipeline (attached for reference)
_"could not find instruction key [882] in the following instance: <|start_header_id|>user<|end_headerid|> "
Upon checking the collator function in a separate jupyter notebook, I dont see this error.
Is this something to do with the distributed learning approach?
I followed the FSDP approach as mentioned here:
https://www.philschmid.de/fsdp-qlora-llama3
Can anyone suggest what is it that I am missing here?
fsdp_qlora.txt The loss is returned as NaN when using DataCollatorForCompletionOnlyLM with the FSDP pipeline (attached for reference) _"could not find instruction key
[882]
in the following instance: <|start_header_id|>user<|end_headerid|> " Upon checking the collator function in a separate jupyter notebook, I dont see this error. Is this something to do with the distributed learning approach? I followed the FSDP approach as mentioned here: https://www.philschmid.de/fsdp-qlora-llama3 Can anyone suggest what is it that I am missing here?