ogrisel / pygbm

Experimental Gradient Boosting Machines in Python with numba.
MIT License
183 stars 32 forks source link

mean_samples_leaf does not do what it's suppose to do #34

Closed ogrisel closed 6 years ago

ogrisel commented 6 years ago

min_samples_leaf=20 should ensure that we never do a split that would result in less than 20 samples in each of the two resulting leaves.

Our current implementation does not split nodes with less than 20 samples which is not the same. Our current implementation is akin to the min_samples_split of scikit-learn trees which is not a good hyperparameter to control over-fitting.

I am working on a fix.

ogrisel commented 6 years ago

min_samples_leaf is now doing what it's supposed to do (I think) but pygbm is still not 100% consistent with LightGBM. So #32 is not resolved yet.