Open edwko opened 1 month ago
Possibly related to #522
@edwko Could you try to decrease ngroups to 1 to see if the issue is related to the one I'm having?
@edwko Could you try to decrease ngroups to 1 to see if the issue is related to the one I'm having?
@DanFosing Yes, using ngroups 1, which is the default setting, the training process is stable. Processed over 30 billion tokens, and everything appears to be stable.
Hi, I'm experiencing an issue with
clip_grad_norm_
and loss values while training Mamba2. After training for some time, the gradient norm starts to rapidly increase to infinity. If training continues, the loss eventually becomes NaN.With gradient accumulation:
Training with no gradient accumulation:
Below is a simple training script that reproduces this issue. I'm wondering if I'm doing something incorrect: