havakv / pycox

Survival analysis with PyTorch
BSD 2-Clause "Simplified" License
822 stars 191 forks source link

Assertion Error for Eval_Surv #39

Closed kylejyx closed 4 years ago

kylejyx commented 4 years ago

Hi

Thanks for this awesome package. The version I used is 0.2.0. Here is the code:

_ = model.compute_baseline_hazards()
surv = model.predict_surv_df((x1, x2))
print(np.isnan(np.sum(surv.to_numpy()))) #Output False
print(pd.Series(surv.index.values).is_monotonic) #Output True
ev = EvalSurv(surv, durations, events, censor_surv='km')
return ev.concordance_td()

However, the following error showed up:

File "test.py", line 106, in CV Cindex = Coxnnet_evaluate(model, x1_test, x2_test, y1_test, y2_test) File "test.py", line 85, in Coxnnet_evaluate ev = EvalSurv(surv, durations, events, censor_surv='km') File "/home/group8/anaconda3/envs/BHI/lib/python3.8/site-packages/pycox/evaluation/eval_surv.py", line 33, in init self.censor_surv = censor_surv File "/home/group8/anaconda3/envs/BHI/lib/python3.8/site-packages/pycox/evaluation/eval_surv.py", line 51, in censor_surv self.add_km_censor() File "/home/group8/anaconda3/envs/BHI/lib/python3.8/site-packages/pycox/evaluation/eval_surv.py", line 107, in add_km_censor return self.add_censor_est(surv, steps) File "/home/group8/anaconda3/envs/BHI/lib/python3.8/site-packages/pycox/evaluation/eval_surv.py", line 95, in add_censor_est censor_surv = self._constructor(censor_surv, self.durations, 1-self.events, None, File "/home/group8/anaconda3/envs/BHI/lib/python3.8/site-packages/pycox/evaluation/eval_surv.py", line 36, in init assert pd.Series(self.index_surv).is_monotonic AssertionError

Would you mind suggesting some possible reasons? Thank you.

kylejyx commented 4 years ago

'surv' dataframe has the index as follows: [0.000e+00 1.000e+00 5.000e+00 8.000e+00 1.000e+01 2.100e+01 2.400e+01 ...] However, in normal cases: surv seems to begin with [-7 0 1 5 8].

When I tried to print out the self.index_surv in eval_surv, it shows: [ 0.000e+00 -7.000e+00 0.000e+00 5.000e+00 4.900e+01 7.000e+01 2.000e+02 2.220e+02 2.580e+02 3.260e+02 3.390e+02 3.660e+02 3.750e+02 3.760e+02 3.800e+02 3.850e+02 3.930e+02 3.940e+02 4.040e+02 4.080e+02 4.100e+02 4.280e+02 4.310e+02 4.330e+02 4.390e+02 4.460e+02 4.480e+02 4.610e+02 4.630e+02 4.670e+02 4.710e+02 4.770e+02 4.880e+02 5.040e+02 5.230e+02 5.240e+02 5.380e+02 5.620e+02 5.680e+02 5.770e+02 5.840e+02 5.850e+02 5.910e+02 5.980e+02 6.070e+02 6.120e+02 6.140e+02 6.160e+02 6.270e+02 6.430e+02 6.460e+02 6.590e+02 6.790e+02 6.940e+02 7.140e+02 7.150e+02 7.270e+02 7.520e+02 7.950e+02 8.200e+02 9.070e+02 9.120e+02 9.650e+02 9.720e+02 9.870e+02 1.001e+03 1.010e+03 1.013e+03 1.032e+03 1.101e+03 1.133e+03 1.185e+03 1.189e+03 1.220e+03 1.234e+03 1.246e+03 1.275e+03 1.325e+03 1.326e+03 1.363e+03 1.417e+03 1.471e+03 1.523e+03 1.545e+03 1.550e+03 1.563e+03 1.596e+03 1.604e+03 1.611e+03 1.620e+03 1.688e+03 1.759e+03 1.820e+03 1.847e+03 1.871e+03 1.882e+03 1.935e+03 2.108e+03 2.109e+03 2.193e+03 2.371e+03 2.486e+03 2.489e+03 2.534e+03 2.629e+03 2.645e+03 2.712e+03 2.965e+03 2.976e+03 2.989e+03 3.203e+03 3.204e+03 3.660e+03 3.669e+03 3.736e+03 3.959e+03 4.047e+03 5.176e+03]

havakv commented 4 years ago

Thank you for the reported bug and the kind words!

So I don't really know whats wrong, but I have some ideas.

Are any of your durations negative? We've generally assumed that duration (time) starts at zero, so having negative durations might cause some unexpected results. If this is the case, replacing this should work

from pycox.utils import kaplan_meier
censor_surv = kaplan_meier(durations, 1-events, durations.min())
ev = EvalSurv(surv, durations, events, censor_surv=censor_surv)
return ev.concordance_td()

It might, however, be better to use durations that are non-negative.

Finally, if you only want the concordance ev.concordance_td() you, don't need censoring estimates at all. So you can just leave the censor_surv argument empty (and save some unnecessary computation)

ev = EvalSurv(surv, durations, events)
return ev.concordance_td()

Does this help?

kylejyx commented 4 years ago

Yes, thanks a lot for the valuable suggestions! The data indeed contains negative observation time. It works fine after removing those observations.