neurodata / treeple

Scikit-learn compatible decision trees beyond those offered in scikit-learn
https://treeple.ai
Other
62 stars 14 forks source link

Need to track constant features in multi-view decision trees #178

Open adam2392 opened 9 months ago

adam2392 commented 9 months ago

The only difference between our splitters and the ones in scikit-learn are that the ones in scikit-learn leverage an efficient way to track columns that are "constant" with respect to a target y variable. This ensures with 100% chance that any lower-nodes will not split on said constant features.

This can actually affect performance because when max_features is say 0.3, then you might randomly choose 30% of your features and if there is a very high amount of noise, then it's possible at some node depth for some tree, all 30% of those features may be noise and thus result in constant splits. However, currently oblique splitters will still split the samples, rather than stopping.

adam2392 commented 9 months ago

This was first noticed while testing in https://github.com/neurodata/scikit-tree/pull/172