Open krfricke opened 3 years ago
Any chance someone can point me to an example that uses the Trainable Class API and checkpointing? I want to use the Class API so I can reuse actors when training, and then once I have the best model, immediately use that for some predictions. With the current documentation its unclear how this should work.
Hi @nikhil-sthalekar, does this section of the docs help? https://docs.ray.io/en/master/tune/api_docs/trainable.html#class-api-checkpointing
Hi @krfricke , Using those docs I was able to get the checkpointing to work some of the time, but the script using the class API was failing otherwise. I am getting similar errors using the function API and the TuneReportCheckpointCallback, but my script is able to load the "best checkpoint" from the training run. Right now I'm looking into using the durable wrapper for checkpointing.
You're welcome to share your code on https://discuss.ray.io/ for feedback!
Bumping this up -- I'd still be quite interested in a worked example here (particularly in the distributed setting using ray_xgboost)
Some examples/tutorials in the docs only use tuning, but no checkpointing, like here: https://docs.ray.io/en/master/tune/tutorials/tune-xgboost.html
We should make sure we include examples for saving/restoring checkpoints, especially when using things like callbacks, as in the case with xgboost.