how to get time-to-event prediction

cadrev commented 4 years ago

Hello! I'm a bit confused with the examples but how do I get the time-to-event prediction for a test feature?

havakv commented 4 years ago

If you want survival predictions, I'm not sure I'm able to explain it in much more detail than the examples.

However, assuming (correct me if I'm wrong) that you are looking for is a single number for when in the future an event occurs, such as "we predict that patient x will die in 51 days", none of the methods produce such numbers directly. This is because the survival function is a more complete prediction then the point estimate. Using binary classification as an analogy, it is better to get probability estimates for each class than just a class prediction of 0 or 1. Having estimated the class probabilities, you can produce class predictions by e.g. predicting class 1 if the probability estimate is larger than 0.5 (or any other arbitrary threshold). In the same manner you can produce point estimates for time-to-event prediction by, for instance, considering the time when the survival drops below 0.5 (or any other arbitrary threshold).

Is this what you're looking for?

cadrev commented 4 years ago

I see! Does it make sense then to integrate the survival function at time T for a given threshold to get the risk of an event happening?

havakv commented 4 years ago

The expected survival time is given by $E[T] = \int_0^\infty S(t) dt$ where the survival function is defined as the probability $S(t) = P(T > t)$ so in that case it make sense to integrate the survival function. However, this assumes that $\lim_{t \rightarrow \infty} S(t) = 0$ . Often, our survival estimates are only over a limited time scale, making the integral of the observed time period a poor approximation of the full (possible) survival time. In that case, your expected survival time will be too small.

Consider an example where we have observations from time 0 to time z, and estimated survival S(z) = 0.9. We don't know what happens after time z, but the expected survival time is probably higher than z. However, by integrating the survival up til time z your expected survival times will be smaller than z. On the other hand, if S(z) = 0.01, you will probably get a decent of the expected survival time. So you can use the integral, but probably need som caution.

If you only integrate up til the observed survival time T, I'm not sure what that would represent.

An alternative to the expected survival time is to consider the median survival time m defined by S(m) = 0.5. For survival functions that never cross 0.5, you simply state that the median survival times is larger than z.

cadrev commented 4 years ago

Let me just say beforehand that I really appreciate the writeup! It definitely cleared some of my misconceptions. Thank you, this was what I needed.

havakv commented 4 years ago

Happy to help :)

ZhiliangWu commented 3 years ago

Hi @havakv, thanks for the great package. I still got a question after reading the whole discussion.

Would you think it's okay to use the time-index where the value of the survival function first drops below 0.5 as a good estimation (is it also called median survival time as you mentioned?) for the point estimation of time-to-event? Or is it common to decide the threshold by using a validation set? Thanks for your reply in advance!

havakv commented 3 years ago

Hi @ZhiliangWu, I think this is a good question, but I don't have a good answer for it, so I'm probably not the right person to ask. I don't have a lot of experience predicting a point estimate for the survival time. But I guess the median would be a reasonable choice for the survival point estimate.

vzhilov commented 2 years ago

Thank for you work!

So if we take your first example 01_introduction.ipynb, what would be code to produce the final prediction figure (a numbers of days for a patient)? I have ran your example on my dataset and got the plots drawn, but I cant figure out how to get the actual number)) So after I trained h]the model and want to feed it with a new row for prediction and get a period in response

havakv / pycox

how to get time-to-event prediction #41