Random Forest optimization

ck37 commented 7 years ago

Hello,

Interesting package, I'd like to give it a try sometime soon. Re: the random forest implementation, I'd like to suggest a few changes:

Minobspernode - it would be great to support optimizing over the minimum number of observations per node hyperparameter, because that can be used to reduce overfitting in random forests.
Ntree - I don't think there is a point in optimization over ntree. Breiman proved in his 2001 RF article that there is no problem with increasing ntree to an arbitrary number - it just converges to a performance plateau (section ~2.1) and ends up wasting computation. So I don't see any benefit to optimizing the ntree as there is not any harm in a larger number of trees (unlike GBM).

What do you think?

Appreciate it, Chris

ymattu commented 7 years ago

Hello, Chris. Thaks for a comment.

Minobspernode - it would be great to support optimizing over the minimum number of observations per node hyperparameter, because that can be used to reduce overfitting in random forests.

Yes, we should optimize over the minimum number of observations per node hyperparameter not to overfit. I'd like to add the Minobspernode option.

Ntree - I don't think there is a point in optimization over ntree. Breiman proved in his 2001 RF article that there is no problem with increasing ntree to an arbitrary number - it just converges to a performance plateau (section ~2.1) and ends up wasting computation. So I don't see any benefit to optimizing the ntree as there is not any harm in a larger number of trees (unlike GBM).

Maybe you are right. I will check the article.

ymattu commented 7 years ago

Thank you for your PRs about these points. I merged them.

ymattu / MlBayesOpt

Random Forest optimization #36