unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.23k stars 1.27k forks source link

Gradient norm is zero for training Qwen2.5-0.5B-Instruct in unsloth=="2024.11.6" #1282

Open joe32140 opened 5 days ago

joe32140 commented 5 days ago

Hi,

I encountered an issue after updating to unsloth=="2024.11.6". When training the Qwen2.5-0.5B-Instruct model without PEFT, I observed that the model's gradient norm is 0, resulting in no weight updates.

I noticed a discrepancy in the number of trainable parameters:

This difference in trainable parameters might be related to the training issue.

danielhanchen commented 4 days ago

Oh wait without PEFT? Hmm would it be possible for you to use with torch.autograd.set_detect_anomaly(True): trainer.train()