[Question] Tune Sklearn Algorithm

rohan-gt commented 4 years ago

I have a couple of questions around how tune-sklearn works:

When I set n_iter=10 and max_iters=10 using BOHB, I see that the hyperparameters for the 10 trials are sampled in the first trial itself and they remain the same in all 10 trials. Aren't the hyperparameters of the later trials supposed to change dynamically based on the results from the earlier trials instead?
When setting n_iter=10 and cv=3 does that mean the model is run 30 times with 3 cross-validation models run per trial? In that case isn't it much more efficient to check the test score on only one of the folds and if the score is too low, discard the entire trial without running the other 2 folds and picking another trial instead?
How does early stopping work using max_iters? What is the stopping condition? And how is it different from using one of the schedulers like ASHA from Ray Tune?
I tried setting both n_iter=50 and max_iters=50 but the log only shows 10 trials. Why is that?
On Google Colab, I set n_jobs=-1 for LightGBM and n_jobs=-1 and sk_n_jobs=-1 for tune-sklearn but Resources requested shows 1/2 CPUs used. How are the different types n_jobs used in this scenario and why isn't it using both CPU cores when setting it to -1?

Yard1 commented 4 years ago

I can answer some of those questions: 3 - AFAIK max_iters is the hard limit of iterations, but the early stopping algorithm may choose to end it earlier. 4 - try setting verbosity to 2 and see what's being printed out. For me, the number of trials was equal to what I set.

rohan-gt commented 4 years ago

@Yard1 , 3 - If max_iters is the hard limit, what is n_iter used for? 4 - I did set verbosity=2 and noticed that only 10 trials are run even when I set both to 50

Yard1 commented 4 years ago

3 - they are two different things. max_iters is the number of iteration per trial, and n_iter is the number of trials. So the maximum amount of total iterations is max_iters * n_iter.

inventormc commented 4 years ago

I'm actually not entirely sure how this is supposed to behave. Maybe @richardliaw can answer this one better?
n_iter=10 means you sample 10 hyperparameters from the hyperparameter space, and cv=3 means each of the 10 models will be cross validated using 3 fold cross validation. The way we figure out how to early stop at the moment is just to get the average performance across all folds, as this is how cross_validation is generally done. I'm not sure it's safe to conclude the model is bad just because it does worse on one fold.
@Yard1 gave a good answer to this so I don't need to add much here.
Did you check the cv_results_ dictionary?
n_jobs is used to figure out how many trainables can be run in parallel. So if you specify it to be -1, it will do the maximum number of parallel jobs, using 1 core per job. sk_n_jobs is just used to set the n_jobs parameter of the underlying sklearn estimator. This defaults to -1 to tell sklearn to use all the cores available to it, and can usually be ignored unless you run into errors. I'm not sure why it's using 1/2 a core, but it'd be helpful to have more information/output. Are there 2 cores total on your machine?

richardliaw commented 4 years ago

@rohan-gt these are great questions, I've pushed a PR to address them.

1) great catch, #81 should fix it up and you'll see the expected behavior. 2) I've renamed n_iter -> n_trials in #81.

max_iters + early_stopping => ASHA + absolute limit of max_iters. @Yard1 gave a great explanation.
Maybe an artifact of BOHB. I've pushed some changes and will ping you to try it out on master.
n_jobs=-1 should imply that there are 2 parallel trials running. Did you set n_iter=1? If not, could you post a new issue about how to repro?

Thanks a bunch for trying things out and asking questions - we really want to make you successful!

ray-project / tune-sklearn

[Question] Tune Sklearn Algorithm #75