sebp / scikit-survival

Survival analysis built on top of scikit-learn
GNU General Public License v3.0
1.14k stars 216 forks source link

Disable early stopping criteria #484

Open Sann5 opened 1 month ago

Sann5 commented 1 month ago

Checklist

What does this implement/fix? Explain your changes In sksurv/linear_model/src/coxnet/coxnet.h we find the fit method for the CoxnetSurvivalAnalysis class. This fits a model for the desired regularization path. The current implementation includes an early stop condition which stops fiting a model for the given alphas based on a deviance ratio. However it is desirable sometimes to fit the whole path (using all the alphas) even if the performance does not change significanly across alphas (see #41 for a more detailed explanation of the behaviour). The obvios solution is to get rid of the early stop condition (which is what I have done here by commentitg it out), however I think the better solution would be to include a boolean flag in the class definition that specifies if the user wishes or not to use this early stopping functionality. Sadly I do not have the time to implement such a solution. Please feel free to use this PR as a starting point.

sebp commented 1 month ago

You are correct that this part of the code is probably the reason why coefficients for at most 5 alphas are returned.

However, this check is also part of the reference implementation in the R glmnet package: https://github.com/cran/glmnet/blob/ace3461223a8fe8a3e0c2f749d3d7e8e84e49ea8/src/glmnet3.f90#L4460

Hence, the code does have its purpose. The main question is where the two implementations deviate from each other. I would suggest to use an example where sksurv terminates early and compare the steps in glmnet and sksurv with a debugger.