imbs-hl / ranger

A Fast Implementation of Random Forests
http://imbs-hl.github.io/ranger/
765 stars 190 forks source link

maxstat splitrule for classification #404

Open brfitzpatrick opened 5 years ago

brfitzpatrick commented 5 years ago

Hello,

I'm very interested in fitting random forests for binary classification using collections of explanatory variables of various types that also vary in the numbers of unique values they contain. For this a 'maxstat' (or equivalent) split rule for the classification case would be a great addition to ranger.

Cheers for the great package!

mnwright commented 4 years ago

Just found this issue, which slipped through somehow. We tried several approximations for maximally selected chi-squared statistics but didn't find anything accurate and fast enough to use in a random forest.

@animusnaturae presented a poster about this. Please ask us if you are interested in details.

brfitzpatrick commented 4 years ago

Thanks for the reply, I'd be very interested to see a digital version of the poster if one exists. Do you have other recommendations for the situation I described above?