haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.99k stars 1.12k forks source link

GradientTreeBoost : OnlineRegression #691

Open olivbrau opened 2 years ago

olivbrau commented 2 years ago

Is your feature request related to a problem? Please describe. GradientTreeBoost is a powerfull machine learning algorithm, but it is difficult and painfull to find the good parameters. We have to make multiple attemps, which can be slow. But there is one parameter that could be analysed differently and efficiently : ntrees (nb of trees)

Describe the solution you'd like It would be nice to adapt the fitting method to allow the caller to test the model, at each iteration, to compare the evolution of RMSE (for ex.) on training dataset and validation dataset to see the effect of ntrees, and then be abble to detect when the model is overfitting. It would avoid to test with ntrees=100 then ntrees = 200 etc. which is not efficient. So, in Smile vocabulary, it consists of making GradientTreeBoost an OnlineRegression with update method.

This mechanism could also allow the caller to monitor the progress of the training (UI with progress bar, etc.) and to stop it if too long.

haifengl commented 2 years ago

It is more about early stopping than online learning.

olivbrau commented 2 years ago

Yes, I was wrong, it is early stopping. Since there is no early stopping possible with Gradient Tree Boost, I thought that OnlineRegression could let the user to make his own mechanism. We can do it with Neural Network (MLP) : the user makes his own iteration. I think it is very usefull. And also it let the user to stop learning if someting is wrong, since Gradient Tree Boost can take a long time if not carefully parametrized.