havakv / pycox

Survival analysis with PyTorch
BSD 2-Clause "Simplified" License
822 stars 191 forks source link

Predict survival probabilities at given times #58

Closed gunesevitan closed 3 years ago

gunesevitan commented 3 years ago

Hi, I'm trying to predict survival probabilities of a population at 6, 12 and 24 months. lifelines.CoxPHFitter has that functionality. I am able to specify a times parameter for predict survival function. I wonder if it is also possible with DeepSurv model. If not, how can I achieve similar results?

havakv commented 3 years ago

Hi! There isn't an option to get that directly for DeepSurv here, but if use model.predict_surv_df (like in this notebook) you get predictions in the form of a data frame where the index represents the time. To get predictions for a given point in time, you just have to look at that index. Remember that the survival prediction of DeepSurv are a step function, so to get the survival prediction for time_t, you need something like

surv = model.predict_surv_df(x_test)
preds = surv[surv.index <= time_t].iloc[-1]

Does this makes sense, or do you want a more detailed explanation?

gunesevitan commented 3 years ago

Yeah, it makes perfect sense, but I needed the exact probabilities of the given timesteps because I was trying evaluate my results with AUC metric. I solved it by adding new timesteps as index to surv_df, then I used linear interpolation for missing rows. The results were acceptable.

havakv commented 3 years ago

That should probably work fine. The downside of Cox model is that it only provides estimates at the event times used in the training set (so the hazard is zero between these event times). While the Cox partial likelihood was really genius for other statistical purposes than prediction, it is not obvious how to best do prediction between the event times of the training set.

gunesevitan commented 3 years ago

I see, that makes more sense now. Since the hazard is zero between event times, it is not possible to calculate survival probabilities directly. There are several workarounds to do it like linear interpolation and regression, but they are probably on the outside of this package's scope.