Open konradsemsch opened 3 years ago
It is correct that the expected survival time can be estimated with the integral under the survival function. However, this only leads to a reasonable estimate if the last observed time point corresponds to an event such that the survival function is zero thereafter. If that's not the case, the expected survival time is undefined without making any additional assumptions.
Typical assumptions are (i) assuming the remaining instances experience an event directly after the last observed time point (S(t) becomes zero), (ii) assuming the remaining instances never experience an event (S(t) remains constant), (iii) assuming a parametric form, e.g. exponential curve, at the tail. Finally, estimating the restricted mean survival time, i.e., an expectation over a pre-defined interval rather than all of time, is from a practical perspective the most reasonable.
I don't know how lifelines handles this, but I think, when implementing this, it is important to make it explicit that estimating the mean survival time can be tricky and avoid giving an answer for which the underlying assumptions are not clear.
Hi @sebp and thanks for coming back on this! Yes, there's a note in the lifelines
documentation referring to what you have mentioned:
In general, I would consider this a really desirable feature in one of the future releases. I find your package much better integrated with the sklearn ecosystem, but the lack of these prediction helpers prevents us from using it. An additional point in favour of implementing this is that when I check other package implementations, such as the quite famous survival
package in R, it also provides an option to predict the expected survival time.
I'd be happy to contribute here with a PR 😉
If you can contribute with a PR, that would be even better.
Any updates on this issue for implementation? @sebp @konradsemsch
@PragyanSubedi I did not have time to look into it, I'm afraid.
@PragyanSubedi - I've started looking into that. I should have a first PR draft in a week.
@sebp - I added a basic first PR: https://github.com/sebp/scikit-survival/pull/196. Please take a look at it and let me know if you agree. Once we have that, I'd move further with implementing tests and adding this helper to other models as well.
Hi, @sebp : I wonder why not considering a predict_hazard_function in addition to predict_cumulative_hazard_function ? Or is there an obvious and simple way to get it, that I'm missing... Thank you very much for all the great work!
By definition, it would be the derivative of the cumulative hazard function.
In general, directly estimating the hazard function is tricky.
I'm curious, what would you like to use the hazard function for?
Hi, @konradsemsch and @sebp are you still working with this predict_expectation function? Would be really interested about it.
@tomikrogerus I'm currently not working on it. PR #196 is inactive too.
Hi! I really like your package implementation and its compatibility with sklearn, but one thing that is refraining us from using it in our solution is the lack of the above predict function implementations (particularly predict_expectation). In our case, we need to know the expected number of days a given client is expected to remain a client. Ranking observations by the obtained "arbitrary" score will not work.
Given that the above function is based on the estimated survival functions it should be possible to implement (as long as I read the lifelines code correctly):
Reference to the lifelines implementation: 1) https://lifelines.readthedocs.io/en/latest/fitters/regression/CoxPHFitter.html#lifelines.fitters.coxph_fitter.SemiParametricPHFitter.predict_expectation 2) https://github.com/CamDavidsonPilon/lifelines/blob/master/lifelines/fitters/coxph_fitter.py