Closed zachmayer closed 5 years ago
Hmm I am not sure wether this optimization makes sense for non-standard gradient boosting algorithms (such as LightGBM and XGBoost), where you already compute the Hessian in adition to the Gradient...
Theoretically, the added benefit of computing the Hessian of the loss function should be essentialy equivalent to a method such as Nesterov's accelerated gradient (since it is just a way to provide further information to the optimization technique, in order to make de descent more efficient)...
It looks like it is really an improvement over the original Friedman's gradient boosting, but with "Hessian/Newtonian boosting" that just might not be the case.
It would be really cool to try it out and see if it makes learning faster, but a lot should have to be changed in LightGBM's source code in order to implement this.
Closed in favor of being in #2302. We decided to keep all feature requests in one place.
Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.
See also Lu et. al. (2020) and specifically their discussion in section 6, which seems cognizant of some of the observations here by @julioasotodv.
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.
I just read a cool paper on arxiv and thought it might be of interest to the LightGBM team: https://arxiv.org/pdf/1803.02042.pdf