Closed ck37 closed 7 years ago
Hello, Chris. Thaks for a comment.
Minobspernode - it would be great to support optimizing over the minimum number of observations per node hyperparameter, because that can be used to reduce overfitting in random forests.
Yes, we should optimize over the minimum number of observations per node hyperparameter not to overfit. I'd like to add the Minobspernode option.
Ntree - I don't think there is a point in optimization over ntree. Breiman proved in his 2001 RF article that there is no problem with increasing ntree to an arbitrary number - it just converges to a performance plateau (section ~2.1) and ends up wasting computation. So I don't see any benefit to optimizing the ntree as there is not any harm in a larger number of trees (unlike GBM).
Maybe you are right. I will check the article.
Thank you for your PRs about these points. I merged them.
Hello,
Interesting package, I'd like to give it a try sometime soon. Re: the random forest implementation, I'd like to suggest a few changes:
Minobspernode - it would be great to support optimizing over the minimum number of observations per node hyperparameter, because that can be used to reduce overfitting in random forests.
Ntree - I don't think there is a point in optimization over ntree. Breiman proved in his 2001 RF article that there is no problem with increasing ntree to an arbitrary number - it just converges to a performance plateau (section ~2.1) and ends up wasting computation. So I don't see any benefit to optimizing the ntree as there is not any harm in a larger number of trees (unlike GBM).
What do you think?
Appreciate it, Chris