Open yzhang123 opened 11 months ago
@yzhang123 Met the same problem.. Do you have any recommendation?
@DachengLi1 , might be a seed issue, i reran from scratch and it was ok. another fix is to swich to fp16 from bf16 or use full attention bias + bf16 which seems more stable
@yzhang123 Thanks a lot! Got it!
@yzhang123 have you solved the problem? Can you share with the triton code pls?
when pretraining gpt with triton flash attention loss blows up (from ~2 to 7) halfway into the training and doesn't go down anymore. If i resume from a healthy ckpt without Flash attention the loss is stable. I could reproduce this error.
I was using Alibi