neurodata / treeple

Scikit-learn compatible decision trees beyond those offered in scikit-learn
https://treeple.ai
Other
66 stars 14 forks source link

Need to track constant features in multi-view decision trees #178

Open adam2392 opened 11 months ago

adam2392 commented 11 months ago

The only difference between our splitters and the ones in scikit-learn are that the ones in scikit-learn leverage an efficient way to track columns that are "constant" with respect to a target y variable. This ensures with 100% chance that any lower-nodes will not split on said constant features.

This can actually affect performance because when max_features is say 0.3, then you might randomly choose 30% of your features and if there is a very high amount of noise, then it's possible at some node depth for some tree, all 30% of those features may be noise and thus result in constant splits. However, currently oblique splitters will still split the samples, rather than stopping.

adam2392 commented 11 months ago

This was first noticed while testing in https://github.com/neurodata/scikit-tree/pull/172