Open Srini-98 opened 1 month ago
Oh it's possible it's the new TinyLlama update I did - I'll check this
Thanks. More info on this from my other run: same script runs perfectly fine on a RTX 4090 where I set bf16 training to be true. So my guess is its something to do with the fp16 changes?
(An update here for more info)
I tried another model (Qwen 1.5B) with fp 16 training and it works fine. Problem is specific to tiny llama i think.
Hi,
I am finetuning tiny llama on T4 with FP16. When I use packing the loss seems to be okay. But when I set it to false, the grad_norm goes to nan and the model doesn't learn anything. This came about after I updated to the latest version of unsloth.
Any pointers to fix this would be helpful.
Thanks.