Closed adam2392 closed 3 months ago
I believe this the initial HonestForest implementation took the defaults of the Generalized Random Forest package in R (GRF). Honest forests use a subsample to to learn trees due to the whole idea of "honesty". When bootstrap=True
, I believe what we do (and GRF does) is bootstrap the structure learning subset of the data. In a regular forest, bootstrapping is useful as it helps to decorrelate the trees. It's not clear that this is needed on top of the normal sample splitting present in honest trees.
Currently, we will set bootstrap=False
due to just backwards incompatibility of the unit-tests when changing, but we can explore what will happen if we change it
It is unclear what the default should be because in scikit-learn,
bootstrap=True
on Forests are the default.cc: @rflperry @sampan501 mentioned that your original implementation had
boostrap=False
as the default. To my knowledge, there is no reason to default inHonestForests
, so I'm wondering if we should stick w/ scikit-learn defaults?