min_samples_leaf=20 should ensure that we never do a split that would result in less than 20 samples in each of the two resulting leaves.
Our current implementation does not split nodes with less than 20 samples which is not the same. Our current implementation is akin to the min_samples_split of scikit-learn trees which is not a good hyperparameter to control over-fitting.
min_samples_leaf=20
should ensure that we never do a split that would result in less than 20 samples in each of the two resulting leaves.Our current implementation does not split nodes with less than 20 samples which is not the same. Our current implementation is akin to the
min_samples_split
of scikit-learn trees which is not a good hyperparameter to control over-fitting.I am working on a fix.