unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.54k stars 1.04k forks source link

Tinyllama issues #911

Open Srini-98 opened 1 month ago

Srini-98 commented 1 month ago

Hi,

I am finetuning tiny llama on T4 with FP16. When I use packing the loss seems to be okay. But when I set it to false, the grad_norm goes to nan and the model doesn't learn anything. This came about after I updated to the latest version of unsloth.

Any pointers to fix this would be helpful.

Thanks.

danielhanchen commented 1 month ago

Oh it's possible it's the new TinyLlama update I did - I'll check this

Srini-98 commented 1 month ago

Thanks. More info on this from my other run: same script runs perfectly fine on a RTX 4090 where I set bf16 training to be true. So my guess is its something to do with the fp16 changes?

Srini-98 commented 1 month ago

(An update here for more info)

I tried another model (Qwen 1.5B) with fp 16 training and it works fine. Problem is specific to tiny llama i think.