Trial plateau stopper - Githubissues

krfricke commented 3 years ago

With the stop_on_plateau parameter, trials can be early stopped if their score does change over a number of trials.

If True, a default configuration will be used. If dict, the parameters will be passed to the respective stopper class. Can also be an instantiated TrialPlateauStopper object.

I'm happy to add an example to the docs, but would like to get initial feedback/review first.

Things to consider:

Naming
API (especially for configuring)
Should we move the TrialPlateauStopper to Tune? (cc @richardliaw)

Closes #98

Yard1 commented 3 years ago

This seems like a feature Tune itself could use. It'd be odd to limit it to just tune-sklearn. Great work!

krfricke commented 3 years ago

I refactored the changes. The Tune stoppers are in this PR: https://github.com/ray-project/ray/pull/12750 This PR now mostly contains testing and passing of custom stoppers to tune.run().

A change where I'd like to get your feedback on is that I introduced a "default metric" called objective in the trainable. The idea here is that we always have access to the optimization metric with this name. This is important e.g. for the TrialPlateauStopper, which needs to know which metric we optimize. The name can vary between average_test_score, average_test_True and average_test_False, if I understand the code correctly.

There might be a better way to achieve this, but this was straightforward to implement. Do you have any suggestions?

Yard1 commented 3 years ago

@krfricke Looks great!

Just to clear up how refit works - if multimetric scoring is used, then the refit parameter must be a string key for a metric in the scoring dict. Any other refit value, including True and False that are allowed normally, will throw an exception in conjunction with multimetric scoring. Therefore, the names can be average_test_score and average_test_METRIC, where METRIC is dynamic and up to the user. For example:

score_dict = {"accuracy": accuracy_metric, "auc": auc_metric}

ts = TuneSearchCV(scoring=score_dict, refit=True) # Will throw an exception when fit is called: "When using multimetric scoring, refit must be the name of the scorer used to pick the best parameters. If not needed, set refit to False"

ts = TuneSearchCV(scoring=score_dict, refit="accuracy") #correct usage, accuracy will be used as the objective value, the name being average_test_accuracy

That being said, the approach you have taken will of course work regardless of that value is, without concern for its type. I don't think I can think of a better one and I believe that other sklearn wrappers use a similar approach as well.

Yard1 commented 3 years ago

BTW. We'll need to update the readme too, I think.

krfricke commented 3 years ago

Thanks for the explanation. I updated the README, but we will have to wait until https://github.com/ray-project/ray/pull/12750 is merged so that the link works.

krfricke commented 3 years ago

The PR is merged and I think the test errors are unrelated to this PR.

richardliaw commented 3 years ago

stop_on_plateau is not provided as an option in this PR right?

krfricke commented 3 years ago

That's right, we just pass stoppers to Ray Tune directly.

ray-project / tune-sklearn

Trial plateau stopper #156