ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.94k stars 5.58k forks source link

[Tune] Repeater doesn't work with BayesOptSearch due to the `patience` argument #43489

Open zhqrbitee opened 6 months ago

zhqrbitee commented 6 months ago

What happened + What you expected to happen

When you run tune using Repeater with BayesOptSearch, you probably will see below error:

Traceback (most recent call last):
  File "test_ray_bs.py", line 34, in <module>
    results = tuner.fit()
  File "python3.9/site-packages/ray/tune/tuner.py", line 364, in fit
    return self._local_tuner.fit()
  File "python3.9/site-packages/ray/tune/impl/tuner_internal.py", line 526, in fit
    analysis = self._fit_internal(trainable, param_space)
  File "python3.9/site-packages/ray/tune/impl/tuner_internal.py", line 645, in _fit_internal
    analysis = run(
  File "python3.9/site-packages/ray/tune/tune.py", line 1007, in run
    runner.step()
  File "python3.9/site-packages/ray/tune/execution/tune_controller.py", line 725, in step
    self._maybe_update_trial_queue()
  File "python3.9/site-packages/ray/tune/execution/tune_controller.py", line 832, in _maybe_update_trial_queue
    if not self._update_trial_queue(blocking=not dont_wait_for_trial):
  File "python3.9/site-packages/ray/tune/execution/tune_controller.py", line 614, in _update_trial_queue
    trial = self._search_alg.next_trial()
  File "python3.9/site-packages/ray/tune/search/search_generator.py", line 100, in next_trial
    return self.create_trial_if_possible(self._experiment.spec)
  File "python3.9/site-packages/ray/tune/search/search_generator.py", line 106, in create_trial_if_possible
    suggested_config = self.searcher.suggest(trial_id)
  File "python3.9/site-packages/ray/tune/search/repeater.py", line 135, in suggest
    self._current_group = _TrialGroup(
  File "python3.9/site-packages/ray/tune/search/repeater.py", line 43, in __init__
    assert type(config) is dict, "config is not a dict, got {}".format(config)
AssertionError: config is not a dict, got FINISHED

This is because the patience parameter of BayesOptSearch make the searcher.suggest return config as Searcher.FINISHED, which is a str, when repeated a trial numerous times. However, it seems Repeater expected config as a dict in its suggest method, so it errors out.

I think we should check for whether the config is Searcher.FINISHED and if so, just directly return it in Repeater.suggest. Currently I have to inherit the Repeater and do this hack in the override suggest method, it is quite ugly and not maintainable in a long term, so a early fix would be appreciated.

Versions / Dependencies

Python 3.9.18, Ray 2.8.0, macOS Sonoma

Reproduction script

from ray import tune
from ray.tune.search import Repeater
from ray.tune.search.bayesopt import BayesOptSearch

def evaluation_fn(width, height):
    return (0.1 + width / 100) ** (-1) + height * 0.1

def my_func(config):
    # Hyperparameters
    width, height = config["width"], config["height"]
    value = evaluation_fn(width, height)
    return {"mean_loss": value}

config = {
    "width": tune.uniform(0, 21),
    "height": tune.uniform(-100, 100)
}

repeat_number = 3
bayesopt = BayesOptSearch(metric="mean_loss", mode="min", patience=repeat_number)
re_search_alg = Repeater(bayesopt, repeat=repeat_number, set_index=True)

tuner = tune.Tuner(
    my_func,
    tune_config=tune.TuneConfig(
        search_alg=re_search_alg,
        num_samples=40 * 3,
        metric="mean_loss",
        mode="min"
    ),
    param_space=config,
)
results = tuner.fit()
print(results)
best_result = results.get_best_result()
print(best_result)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

woshiyyya commented 2 months ago

Hi @zhqrbitee , thanks for the investigation! Would you like to post a PR to fix this issue?