Open lambda-lee opened 1 week ago
llamafactory
训练一段时间后,先出现 {'loss': 1.4596, 'grad_norm': nan, 'learning_rate': 4.552255167404752e-06, 'epoch': 0.66} 接着就都是 {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 4.549912997495027e-06, 'epoch': 0.66} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 4.549912997495027e-06, 'epoch': 0.66} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 4.549912997495027e-06, 'epoch': 0.66}
怎么实现训练过程中能够跳过出现错误的数据,继续执行训练,谢谢
No response
换个模型试试?
我也是这种情况,请问你解决了嘛?
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
训练一段时间后,先出现 {'loss': 1.4596, 'grad_norm': nan, 'learning_rate': 4.552255167404752e-06, 'epoch': 0.66} 接着就都是 {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 4.549912997495027e-06, 'epoch': 0.66} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 4.549912997495027e-06, 'epoch': 0.66} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 4.549912997495027e-06, 'epoch': 0.66}
Expected behavior
怎么实现训练过程中能够跳过出现错误的数据,继续执行训练,谢谢
Others
No response