I encountered an issue after updating to unsloth=="2024.11.6". When training the Qwen2.5-0.5B-Instruct model without PEFT, I observed that the model's gradient norm is 0, resulting in no weight updates.
I noticed a discrepancy in the number of trainable parameters:
Hi,
I encountered an issue after updating to unsloth=="2024.11.6". When training the
Qwen2.5-0.5B-Instruct
model without PEFT, I observed that the model's gradient norm is 0, resulting in no weight updates.I noticed a discrepancy in the number of trainable parameters:
This difference in trainable parameters might be related to the training issue.