havakv / pycox

Survival analysis with PyTorch
BSD 2-Clause "Simplified" License
803 stars 188 forks source link

Question about discretion time grid in 01_introduction.ipynb #83

Closed hanxiaozhen2017 closed 3 years ago

hanxiaozhen2017 commented 3 years ago

Hello everyone, I am new to pycox and currently learning the script 01_introduction.ipynb I don't understand why the 1st observation's labtrans.cuts value is 78.9 while the 2nd observation's value is 118.4. (output below) Since the duration time for observation 0 is 99.3 and duration time for observation 1 is 95.7. I think they should be in the same discretion time grid (from 78.9 to 118.4)? Thanks for help answering this.

IN: labtrans.cuts[y_train[0]] OUT: array([ 78.933334, 118.4 , 236.8 , ..., 39.466667, 197.33334 , 118.4 ], dtype=float32)

IN: labtrans.cuts OUT: array([ 0. , 39.466667, 78.933334, 118.4 , 157.86667 , 197.33334 , 236.8 , 276.26666 , 315.73334 , 355.2 ], dtype=float32)

havakv commented 3 years ago

Hi. The reason for why they end up in different buckets is that on is censored and the other is not (event is 0 and 1). The default behavior is to move censored observations to the previous interval (we only know that they were still "alive" at this time) while events are moved to the end of the interval. You can read about this in Section 4.1 Discretization of Durations in https://arxiv.org/pdf/1910.06724.pdf . However, this is not really a strict assumption, so you could probably just move use the same buckets whether they are censored or not.

Hope this answers your question

hanxiaozhen2017 commented 3 years ago

@havakv Thank you so much !!