ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
465 stars 52 forks source link

False Error log complains failed to read the result of trails #270

Open JingChen23 opened 1 year ago

JingChen23 commented 1 year ago

Well, I just copy the code from https://docs.ray.io/en/master/tune/examples/tune-sklearn.html, add some ray.init code to let the Tune tasks run on a remote Ray cluster:

ray.init(
    address="ray://xxx.xxx.xxx.xxx:10001",
    runtime_env={"pip": ["tune-sklearn==0.4.6",
                         "scikit-learn==1.0.2"]
                 },
)

However I met below log:

2023-07-10 18:37:16,273 WARNING experiment_analysis.py:917 -- Failed to read the results for 6 trials:
- /home/ray/ray_results/_Trainable_2023-07-10_03-36-56/_Trainable_b19bc_00000_0_alpha=0.0001,epsilon=0.0100_2023-07-10_03-36-57
- /home/ray/ray_results/_Trainable_2023-07-10_03-36-56/_Trainable_b19bc_00001_1_alpha=0.1000,epsilon=0.0100_2023-07-10_03-36-57
- /home/ray/ray_results/_Trainable_2023-07-10_03-36-56/_Trainable_b19bc_00002_2_alpha=1,epsilon=0.0100_2023-07-10_03-36-57
- /home/ray/ray_results/_Trainable_2023-07-10_03-36-56/_Trainable_b19bc_00003_3_alpha=0.0001,epsilon=0.1000_2023-07-10_03-36-57
- /home/ray/ray_results/_Trainable_2023-07-10_03-36-56/_Trainable_b19bc_00004_4_alpha=0.1000,epsilon=0.1000_2023-07-10_03-36-57
- /home/ray/ray_results/_Trainable_2023-07-10_03-36-56/_Trainable_b19bc_00005_5_alpha=1,epsilon=0.1000_2023-07-10_03-36-57

After some triage I found the files under /home/ray/ray_results/_Trainable_2023-07-10_03-36-56 do exist on the remote cluster's head node. But what the code from https://docs.ray.io/en/master/tune/examples/tune-sklearn.html doing is to read the remote logs from my local dev machine, which definitely will fail because the results of the 6 trails don't exist locally.

So the question is: Why the code will try to read the results on the remote host from local? Is this a bug or it is my fault?