Closed bblodfon closed 2 years ago
glmnet_lrn = lrn('surv.glmnet', id = 'CoxLasso', standardize = FALSE, lambda = 0.01, alpha = 1)
xgboost_lrn = lrn('surv.xgboost', id = 'XGBoost Survival Learner')
rpart_lrn = lrn('surv.rpart', id = 'Survival Tree')
ranger_lrn = lrn('surv.ranger', id = 'Survival Forest', verbose = FALSE)
These learners will probably fit a model very fast i.e. a single CPU will be for a very short time at 100%. Try to set a higher number for nrounds
of surv.xgboost
or num.trees
of surv.ranger
and watch your CPUs again.
Thanks, I will try that! I really thought that the implicit parallelization of e.g. surv.ranger
would interfere with future
's parallelization, as the documentation says.
Also the batch_size
in tuning seems to really affect total CPU utilization - I think its importance maybe not be stressed enough in the parallelization part of the mlr3book
!
We have implicit parallelization turned off in ranger to avoid any interferences. You can enable it with set_threads()
(but I would not recommend this for all parallelization backends).
I'll include a warning in the book about the batch size.
@mllg I also found that while doing nested-CV with a learner that doesn't support implicit parallelization (or it is turned off by default like in ranger
) and it takes at least some seconds to run a simple iteration/train on a particular dataset, setting up a combination of future::plan(list("sequential", "multisession"))
and increasing batch_size
for the inside tuner
was (across my benchmarks) much faster and showed better CPU utilization (more cores were used) compared to the same number of batch_size
and a future::plan(list("multisession", "sequential"))
. I don't know if it that's how it is supposed to work in general, but oh well you always have to test and see :)
I am now trying to figure out if there will be some benefit running nested-CV with some future plan + implicit parallelization enabled (to some extend) compared to just doing everything sequential and setting a large number of threads for the learner e.g. ranger
.
The total number of CPUs available is a major factor and I think some generic rules of thumb would be a great addition to the documentation as well, e.g. I have 32 CPUs total I can use and want to do nested-CV, set future::plan(list("multisession", "sequential"))
with #out_folds = 5
and batch_size=5
in the tuner
to utilize 25 CPUs (so using close to everything but not all).
You included it already, that's okay :)
Hi,
I have tested this benchmark script on 2 servers, one with 32 CPUs and one with 256 CPUs. I never get all CPUs utilized - i.e. I only get around 10 out of 32 and <100 respectively at 100%. I thought that all CPUs would be used in such a
multisession
configuration? Kinda same expectation I had with nested-CV using this script, but again less CPUs were fully utilized during execution. Any thoughts why this is happening, i.e. is it expected/normal behaviour?