Thanks for all the hard work with library and the broader ray ecosystem!
I have been trying to add tune-sklearn to a generic class for tuning in a script. However, I'm hitting the following error when trying to use TuneSearchCV. The scripts utilize hydra for configuration management, and here's the relevant invocation for TuneSearchCV using Bayesian optimization on a skorch model
python ddm_trainer.py model=torch model.sweep.run=True model.sweep.search_algorithm=bayesian
# errors:
datamodeler : INFO Building model...
[2021-04-19 09:31:39,523][datamodeler][INFO] - Building model...
datamodeler : INFO Sweeping with parameters: {'lr': [0.01, 0.02], 'module__num_units': [10, 50]}
[2021-04-19 09:31:39,524][datamodeler][INFO] - Sweeping with parameters: {'lr': [0.01, 0.02], 'module__num_units': [10, 50]}
/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/tune_sklearn/tune_basesearch.py:429: UserWarning: early_stopping is enabled but max_iters = 1. To enable partial training, set max_iters > 1.
category=UserWarning)
[2021-04-19 09:31:41,756][tune_sklearn.tune_basesearch][INFO] - TIP: Hiding process output by default. To show process output, set verbose=2.
[2021-04-19 09:31:41,875][ray.tune.trial_runner][WARNING] - Trial Runner checkpointing failed: can't pickle dict_values objects
[2021-04-19 09:31:44,629][ray.tune.trial_runner][ERROR] - Trial _Trainable_b8f96ba4: Error processing event.
Traceback (most recent call last):
File "ddm_trainer.py", line 74, in main
scoring_func=cfg["model"]["sweep"]["scoring_func"],
File "/Users/alizaidi-msft/Documents/bonsai/datadrivenmodel/torch_models.py", line 183, in sweep
search.fit(X, y)
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/tune_sklearn/tune_basesearch.py", line 664, in fit
result = self._fit(X, y, groups, **fit_params)
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/tune_sklearn/tune_basesearch.py", line 565, in _fit
analysis = self._tune_run(config, resources_per_trial)
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/tune_sklearn/tune_search.py", line 715, in _tune_run
analysis = tune.run(trainable, **run_args)
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/tune/tune.py", line 421, in run
runner.step()
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 402, in step
self._process_events(timeout=timeout) # blocking
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 560, in _process_events
self._process_trial(trial)
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 586, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 609, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
return func(*args, **kwargs)
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/worker.py", line 1456, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ModuleNotFoundError): ray::_Trainable.train_buffered() (pid=5485, ip=10.0.0.29)
File "python/ray/_raylet.pyx", line 439, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 442, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/serialization.py", line 245, in deserialize_objects
self._deserialize_object(data, metadata, object_ref))
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/serialization.py", line 192, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
File "/Users/alizaidi-msft/miniconda3/envs/ddm/lib/python3.7/site-packages/ray/serialization.py", line 160, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
ModuleNotFoundError: No module named 'torch_models'
It seems that ray is serializing the script but loses track of the module where this module is running from. Intriguingly, this same invocation works in the test suite where the tests are running from a subdirectory (commented out for the CI pipeline but uncommenting and running works 🤷 ).
Should I restructure the scripts in a way to make ray happier when running TuneSearchCV?
Hi,
Thanks for all the hard work with library and the broader ray ecosystem!
I have been trying to add tune-sklearn to a generic class for tuning in a script. However, I'm hitting the following error when trying to use
TuneSearchCV
. The scripts utilize hydra for configuration management, and here's the relevant invocation for TuneSearchCV using Bayesian optimization on a skorch modelIt seems that ray is serializing the script but loses track of the module where this module is running from. Intriguingly, this same invocation works in the test suite where the tests are running from a subdirectory (commented out for the CI pipeline but uncommenting and running works 🤷 ).
Should I restructure the scripts in a way to make ray happier when running TuneSearchCV?