Should `HonestForest*` have `bootstrap=False` or `bootstrap=True` as default

neurodata / treeple

Scikit-learn compatible decision trees beyond those offered in scikit-learn

https://treeple.ai

Other

61 stars 14 forks source link

Should `HonestForest*` have `bootstrap=False` or `bootstrap=True` as default #146

Closed adam2392 closed 3 months ago

adam2392 commented 11 months ago

It is unclear what the default should be because in scikit-learn, bootstrap=True on Forests are the default.

cc: @rflperry @sampan501 mentioned that your original implementation had boostrap=False as the default. To my knowledge, there is no reason to default in HonestForests, so I'm wondering if we should stick w/ scikit-learn defaults?

rflperry commented 11 months ago

I believe this the initial HonestForest implementation took the defaults of the Generalized Random Forest package in R (GRF). Honest forests use a subsample to to learn trees due to the whole idea of "honesty". When bootstrap=True , I believe what we do (and GRF does) is bootstrap the structure learning subset of the data. In a regular forest, bootstrapping is useful as it helps to decorrelate the trees. It's not clear that this is needed on top of the normal sample splitting present in honest trees.

adam2392 commented 4 months ago

Currently, we will set bootstrap=False due to just backwards incompatibility of the unit-tests when changing, but we can explore what will happen if we change it