Open cderv opened 4 years ago
Thanks, that's a very good idea. I like your first idea (R side) and will have a look on how other packages solve this.
Hi! Has there been any update on this? I found this really helpful in figuring out some issues I was having with using ranger
in conjunction with doParallel
Sorry, no update yet. A PR would be very welcome!
Done in #713.
Hi,
Some users in our teams are using
ranger
on our shared RStudio Server Pro cluster. As many R users are not familiar with threading and paralellization so they use the default behavior in ranger. This means that they will use all the hardware available threads https://github.com/imbs-hl/ranger/blob/d1ecaded22b1057ad8b3508c60af26bada90c8e4/src/Forest.cpp#L198-L208This is not ideal on shared servers where several datascientists needs to share ressources. Currently we have some documentation to warn them so that they do not forget the
num.threads
argument when callingranger()
.However, it would be nice if, as an analytic admin of the service we provide to our user, I could change the default behavior so that
ranger()
does not use the full available capacity on the server for one user.I think it could be done :
num. thread
isNULL
(the default). Something like (with more control I guess)num.thread = 0
This type of configuration are already done in other R package like
?getDTthread
and associated C file. they use a combination of data.table specific environment variables on the C side, or using openMP control feature.omp_get_*
functions using an environment variableThis may be a specific use case but it would help a lot in some shared environment.
Would you consider something like that ?
Thank you very much.