imbs-hl / ranger

A Fast Implementation of Random Forests
http://imbs-hl.github.io/ranger/
772 stars 194 forks source link

Allow to change default behavior for num.thread = NULL in ranger() #513

Open cderv opened 4 years ago

cderv commented 4 years ago

Hi,

Some users in our teams are using ranger on our shared RStudio Server Pro cluster. As many R users are not familiar with threading and paralellization so they use the default behavior in ranger. This means that they will use all the hardware available threads https://github.com/imbs-hl/ranger/blob/d1ecaded22b1057ad8b3508c60af26bada90c8e4/src/Forest.cpp#L198-L208

This is not ideal on shared servers where several datascientists needs to share ressources. Currently we have some documentation to warn them so that they do not forget the num.threads argument when calling ranger().

However, it would be nice if, as an analytic admin of the service we provide to our user, I could change the default behavior so that ranger() does not use the full available capacity on the server for one user.

I think it could be done :

This type of configuration are already done in other R package like

This may be a specific use case but it would help a lot in some shared environment.

Would you consider something like that ?

Thank you very much.

mnwright commented 4 years ago

Thanks, that's a very good idea. I like your first idea (R side) and will have a look on how other packages solve this.

mgoplerud commented 3 years ago

Hi! Has there been any update on this? I found this really helpful in figuring out some issues I was having with using ranger in conjunction with doParallel

mnwright commented 3 years ago

Sorry, no update yet. A PR would be very welcome!

mnwright commented 10 months ago

Done in #713.