imbs-hl / ranger

A Fast Implementation of Random Forests
http://imbs-hl.github.io/ranger/
772 stars 193 forks source link

apparent typo in ranger(num.threads)'s default value #743

Closed twest820 closed 5 days ago

twest820 commented 6 days ago

The documentation for ranger() reads

num.threads Number of threads. Default is number of CPUs available.

but, consistent with remarks in #513, num.threads' default value looks to be std::thread::hardware_concurrency(), which returns the number of concurrent threads. The default value thus exceeds the number of cores by the extent of hyperthreading support and will thus be higher by a factor of two (AMD Zen, many Intel cases) or ~1.5 (Alder and Raptor Lake).

Fix would be something like "Default is number of simultaneous threads supported by the CPU." The distinction matters as a 2x difference is easily significant to cache contention, DDR utilization, and system usability while ranger is running.

mnwright commented 5 days ago

Thanks, you are right. However, the default was recently changed anyway. See https://imbs-hl.github.io/ranger/reference/ranger.html#arg-num-threads and #713.