Closed rohan-gt closed 4 years ago
I can answer some of those questions:
3 - AFAIK max_iters
is the hard limit of iterations, but the early stopping algorithm may choose to end it earlier.
4 - try setting verbosity to 2 and see what's being printed out. For me, the number of trials was equal to what I set.
@Yard1 ,
3 - If max_iters
is the hard limit, what is n_iter
used for?
4 - I did set verbosity=2
and noticed that only 10 trials are run even when I set both to 50
3 - they are two different things. max_iters
is the number of iteration per trial, and n_iter
is the number of trials. So the maximum amount of total iterations is max_iters
* n_iter
.
n_iter=10
means you sample 10 hyperparameters from the hyperparameter space, and cv=3
means each of the 10 models will be cross validated using 3 fold cross validation. The way we figure out how to early stop at the moment is just to get the average performance across all folds, as this is how cross_validation is generally done. I'm not sure it's safe to conclude the model is bad just because it does worse on one fold.cv_results_
dictionary?n_jobs
is used to figure out how many trainables can be run in parallel. So if you specify it to be -1, it will do the maximum number of parallel jobs, using 1 core per job. sk_n_jobs
is just used to set the n_jobs
parameter of the underlying sklearn estimator. This defaults to -1 to tell sklearn to use all the cores available to it, and can usually be ignored unless you run into errors. I'm not sure why it's using 1/2 a core, but it'd be helpful to have more information/output. Are there 2 cores total on your machine?@rohan-gt these are great questions, I've pushed a PR to address them.
1) great catch, #81 should fix it up and you'll see the expected behavior. 2) I've renamed n_iter -> n_trials in #81.
Thanks a bunch for trying things out and asking questions - we really want to make you successful!
I have a couple of questions around how tune-sklearn works:
When I set
n_iter=10
andmax_iters=10
using BOHB, I see that the hyperparameters for the 10 trials are sampled in the first trial itself and they remain the same in all 10 trials. Aren't the hyperparameters of the later trials supposed to change dynamically based on the results from the earlier trials instead?When setting
n_iter=10
andcv=3
does that mean the model is run 30 times with 3 cross-validation models run per trial? In that case isn't it much more efficient to check the test score on only one of the folds and if the score is too low, discard the entire trial without running the other 2 folds and picking another trial instead?How does early stopping work using
max_iters
? What is the stopping condition? And how is it different from using one of the schedulers like ASHA from Ray Tune?I tried setting both
n_iter=50
andmax_iters=50
but the log only shows 10 trials. Why is that?On Google Colab, I set
n_jobs=-1
for LightGBM andn_jobs=-1
andsk_n_jobs=-1
for tune-sklearn but Resources requested shows 1/2 CPUs used. How are the different typesn_jobs
used in this scenario and why isn't it using both CPU cores when setting it to -1?