ymattu / MlBayesOpt

R package to tune parameters for machine learning(Support Vector Machine, Random Forest, and Xgboost), using bayesian optimization with gaussian process
Other
45 stars 15 forks source link

Random Forest optimization #36

Closed ck37 closed 7 years ago

ck37 commented 7 years ago

Hello,

Interesting package, I'd like to give it a try sometime soon. Re: the random forest implementation, I'd like to suggest a few changes:

  1. Minobspernode - it would be great to support optimizing over the minimum number of observations per node hyperparameter, because that can be used to reduce overfitting in random forests.

  2. Ntree - I don't think there is a point in optimization over ntree. Breiman proved in his 2001 RF article that there is no problem with increasing ntree to an arbitrary number - it just converges to a performance plateau (section ~2.1) and ends up wasting computation. So I don't see any benefit to optimizing the ntree as there is not any harm in a larger number of trees (unlike GBM).

What do you think?

Appreciate it, Chris

ymattu commented 7 years ago

Hello, Chris. Thaks for a comment.

Minobspernode - it would be great to support optimizing over the minimum number of observations per node hyperparameter, because that can be used to reduce overfitting in random forests.

Yes, we should optimize over the minimum number of observations per node hyperparameter not to overfit. I'd like to add the Minobspernode option.

Ntree - I don't think there is a point in optimization over ntree. Breiman proved in his 2001 RF article that there is no problem with increasing ntree to an arbitrary number - it just converges to a performance plateau (section ~2.1) and ends up wasting computation. So I don't see any benefit to optimizing the ntree as there is not any harm in a larger number of trees (unlike GBM).

Maybe you are right. I will check the article.

ymattu commented 7 years ago

Thank you for your PRs about these points. I merged them.