Closed xwu99 closed 5 months ago
Should report metrics and checkpoints from local TransformerTrainer to Ray Train to ensure fault-tolerant training
https://docs.ray.io/en/latest/train/getting-started-transformers.html#report-checkpoints-and-metrics https://docs.ray.io/en/latest/train/user-guides/checkpoints.html
@harborn @KepingYan Could you study this and clarify the correct process for the new TorchTrainer?
Should report metrics and checkpoints from local TransformerTrainer to Ray Train to ensure fault-tolerant training
https://docs.ray.io/en/latest/train/getting-started-transformers.html#report-checkpoints-and-metrics https://docs.ray.io/en/latest/train/user-guides/checkpoints.html
@harborn @KepingYan Could you study this and clarify the correct process for the new TorchTrainer?