sebp / scikit-survival

Survival analysis built on top of scikit-learn
GNU General Public License v3.0
1.13k stars 214 forks source link

Histogram-based Gradient Boosting survival models #473

Open ogencoglu opened 2 months ago

ogencoglu commented 2 months ago

It would be great to have Histogram-based Gradient Boosting models on top of normal ones as it is much more scalable : They are supported by scikit-learn:

RandallJEllis commented 2 months ago

It can also support missing values, which AFAICT only RandomSurvivalForests in sksurv can handle missing values

sebp commented 2 months ago

It seems to be possible by sub-classing BaseHistGradientBoosting

https://github.com/scikit-learn/scikit-learn/blob/70fdc843a4b8182d97a3508c1a426acc5e87e980/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L140

and passing a new implementation of BaseLoss as loss argument

https://github.com/scikit-learn/scikit-learn/blob/70fdc843a4b8182d97a3508c1a426acc5e87e980/sklearn/_loss/loss.py#L67

BaseLoss ultimately calls a subclass of CyLossFunction

https://github.com/scikit-learn/scikit-learn/blob/70fdc843a4b8182d97a3508c1a426acc5e87e980/sklearn/_loss/_loss.pxd#L24

The original PR to add histogram-based gradient boosting in scikit-learn is https://github.com/scikit-learn/scikit-learn/pull/12807