ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
468 stars 52 forks source link

Duplicated trials #207

Open qo4on opened 3 years ago

qo4on commented 3 years ago

I run your HEBO custom example and see that it runs the same trials multiple times. Can I skip them and finish when there are no unique hp configurations left? Setting cv=5 I expected to see 5 test scores for each trial, but it shows only 3 of them: split0_test_score, split1_test_score, split2_test_score. Can you clarify how it works?

seed = 0

clf = RandomForestClassifier(random_state=seed)

param_distributions = {
    "n_estimators": tune.randint(20, 21),
    "max_depth": tune.randint(2, 3),
}

tune_search = TuneSearchCV(
    clf,
    param_distributions,
    n_trials=5,
    search_optimization=HEBOSearch(),
    cv=5,
    random_state=seed,
    local_dir='ray',
    verbose=2,
)

tune_search.fit(x_train, y_train)

image

Yard1 commented 3 years ago

How duplicate trials are handled depends on the search algorithm itself, and it looks like HEBO doesn't account for those.

qo4on commented 3 years ago

So, we can fix that by sending existing results to HEBO without additional training. Where are the test score results for remaining split3_test_score and split4_test_score?

Yard1 commented 3 years ago

Isn't it just due to Pandas trying to fit the dataframe on screen? pd.DataFrame(tune_search.cv_results_) should have all the info you need

qo4on commented 3 years ago

Thank you, it works.

Don't you think it would be good to skip all duplicates for all searchers by default? I think it's a big problem when you think you tune hp's but in fact you're training the same over and over again.

Yard1 commented 3 years ago

It's not a straightforward thing, as those should be ideally handled by the search algorithm itself. For example, if we were to reject duplicates from a search algorithm that doesn't check for them, it is possible for the situation to become an infinite loop where the tuner rejects the duplicate suggestion only for the algorithm to suggest it again, as this is what it considers to be the best configuration.

In any case, that should be done in Ray Tune itself, and not here. @krfricke what do you think?

richardliaw commented 3 years ago

Hmm, so yeah this seems to be a common request.

We've actually implemented something similar in Bayesopt (see ray/python/ray/tune/suggest/bayesopt.py).

qo4on commented 3 years ago

Besides HEBO counts duplicates as newly tested parameters and throws an error when it reaches all possible combinations count. This leads to the fact that not all parameters are tested. https://github.com/huawei-noah/noah-research/issues/28

qo4on commented 3 years ago

@richardliaw Which search_optimization don't have such issue except Bayesopt?