intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
11 stars 3 forks source link

enable resume training for automl #987

Open shane-huang opened 4 years ago

shane-huang commented 4 years ago

make use of ray tune parameter "resume" from "ray.tune.run" -

resume (str|bool) – One of “LOCAL”, “REMOTE”, “PROMPT”, or bool. LOCAL/True restores the checkpoint from the local_checkpoint_dir. REMOTE restores the checkpoint from remote_checkpoint_dir. PROMPT provides CLI feedback. False forces a new experiment. If resume is set but checkpoint does not exist, ValueError will be thrown.

shane-huang commented 4 years ago

maybe we should check whether training results can use below options -

https://ray.readthedocs.io/en/latest/tune-usage.html

local_dir (str) – Local dir to save training results to. Defaults to ~/ray_results. upload_dir (str) – Optional URI to sync training results to (e.g. s3://bucket).