ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
465 stars 52 forks source link

Add `seed` parameter to make initial configuration sampling deterministic #140

Closed krfricke closed 3 years ago

krfricke commented 3 years ago

For reproducible results, we need to be able to set a random seed. This PR introduces the respective argument to TuneSearchCV.

Please note that this mostly affects initial configuration sampling. With parallel evaluations, there's always non-determinism involved that can lead to different results of the training run.

Currently not implemented on BOHB.

Closes #112

Yard1 commented 3 years ago

FYI test for BOHB is failing.

Looks good! I don't see any bugs or nits. One idea for better sklearn compatibility - do you think it would be possible to pass numpy RandomState instances in addition to integers - https://scikit-learn.org/stable/glossary.html#term-random-state? It should be possible to just get an integer seed from those and then just carry on with an integer.

krfricke commented 3 years ago

This is weird, it works for me locally every time I run it, even when I alter the search space. I'll look into this tomorrow. I didn't know about RandomState but will make sure that it is supported. Thanks!

krfricke commented 3 years ago

Before I get to this, should we just use the existing random_state parameter for the other search algorithms?

Yard1 commented 3 years ago

I think that would be the best, honestly. I forgot that it's already there! I think a user would expect just one randomness parameter to control everything. I believe that's also how other libraries do it. I don't see much utility in splitting it into random_state and seed.

krfricke commented 3 years ago

But the tests seem to fail nevertheless. Weird. Could you run them locally and see if you get the same errors? I don't have any errors on my machine, no matter how often I run them.

krfricke commented 3 years ago

Ah, so it's still just BOHB. Maybe there's a dependency mismatch or so. I'll see what I can find.

krfricke commented 3 years ago

So even on a fresh env with py 3.6.10 it passes for me. I tried 10 different seeds sequentially, all passed. Any ideas? The only difference I see right now is that I'm on mac and the VM is on linux. I could try setting up a new env on a cluster machine later or try on my linux laptop.

Yard1 commented 3 years ago

I wouldn't be surprised if it was something to do with the OS. ConfigSpace uses Cython, so I'd imagine it's not as OS agnostic as other libraries. In the worst case, we may have to just not test for it. One thing we can try, however, is setting n_jobs=1, so it's ran in Local Mode. Perhaps that may be it?

richardliaw commented 3 years ago

Hmm, looks like the seed test is failing on mac os py3.7?

Yard1 commented 3 years ago

It's BOHB again D:

krfricke commented 3 years ago

Ok I think I got it - search space conversions might disregard custom seed. This PR in ray should fix this by introducing a seed parameter for BOHB: https://github.com/ray-project/ray/pull/12160

After that one is merged, tests with latest ray should pass. Let's defer this PR until then.

krfricke commented 3 years ago

It was merged, I'll wait for the next nightly wheels, push an update to this PR and then fingers crossed.