maxstat split rule - Githubissues

imbs-hl / ranger

A Fast Implementation of Random Forests

http://imbs-hl.github.io/ranger/

774 stars 194 forks source link

maxstat split rule #266

Closed ghost closed 6 years ago

ghost commented 6 years ago

I noticed that using as plitting method maxstat it does run multicore. (maybe also for extratrees) It is not possible to parallelize using this rule or is something that will be implemented.

Moreover I cannot have results in accettable time using as rule C on a dataset of 45000 data and mtry 4. Even 1 tree does not return nothing after hours

I tried with 500 trees and in that case, as should be, it use 100% of my 64 logical cores but of course no result in acceptable time Best g

mnwright commented 6 years ago

I'm not sure I understand your question. All split rules run multithreaded. However, the multithreading is implemented tree-wise so you won't benefit for single trees. Yes, the "C" splitting is very slow, in particular for many observations because all pairs are compared. For larger datasets you should probably use "logrank" or "maxstat".