havakv / pycox

Survival analysis with PyTorch
BSD 2-Clause "Simplified" License
810 stars 190 forks source link

Survival time predictions (with CoxPH) using medical images #64

Open aslitoj opened 3 years ago

aslitoj commented 3 years ago

Hello,

I'd like to build a model which takes images and predicts overall survival time as continuous. For that reason, I followed the model shown in this jupyter notebook 04_mnist_dataloaders_cnn.ipynb by using CoxPH instead of LogisticHazards. However, I got 2 different errors. I am using databatchloader by the way.

When I tried with CoxPH and fit the model with:

callbacks = [tt.cb.EarlyStopping()] epochs = 100 verbose = True log = model.fit_dataloader(dl_train, epochs, callbacks, verbose, val_dataloader=dl_val)

Running this code : net is same with the sample notebook stated above. model = CoxPH(net, tt.optim.Adam(0.01)) surv = model.predict_surv_df(dl_test_x) gave me this error:

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last)

in () ----> 1 surv = model.predict_surv_df(dl_test_x) /usr/local/anaconda/lib/python3.6/site-packages/pycox/models/cox.py in predict_surv_df(self, input, max_duration, batch_size, verbose, baseline_hazards_, eval_, num_workers) 153 """ 154 return np.exp(-self.predict_cumulative_hazards(input, max_duration, batch_size, verbose, baseline_hazards_, --> 155 eval_, num_workers)) 156 157 def predict_surv(self, input, max_duration=None, batch_size=8224, numpy=None, verbose=False, /usr/local/anaconda/lib/python3.6/site-packages/pycox/models/cox.py in predict_cumulative_hazards(self, input, max_duration, batch_size, verbose, baseline_hazards_, eval_, num_workers) 123 if baseline_hazards_ is None: 124 if not hasattr(self, 'baseline_hazards_'): --> 125 raise ValueError('Need to compute baseline_hazards_. E.g run `model.compute_baseline_hazards()`') 126 baseline_hazards_ = self.baseline_hazards_ 127 assert baseline_hazards_.index.is_monotonic_increasing,\ ValueError: Need to compute baseline_hazards_. E.g run `model.compute_baseline_hazards()` ` Hence, once I tried to run `model.compute_baseline_hazards()` it gave me this error: `--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 _ = model.compute_baseline_hazards() /usr/local/anaconda/lib/python3.6/site-packages/pycox/models/cox.py in compute_baseline_hazards(self, input, target, max_duration, sample, batch_size, set_hazards, eval_, num_workers) 82 if (input is None) and (target is None): 83 if not hasattr(self, 'training_data'): ---> 84 raise ValueError("Need to give a 'input' and 'target' to this function.") 85 input, target = self.training_data 86 df = self.target_to_df(target)#.sort_values(self.duration_col) ValueError: Need to give a 'input' and 'target' to this function.` Since the training data is shape of (torch.Size([16, 1, 128, 128]), (torch.Size([16]), torch.Size([16]))), I didn't understand how to give the input and target to the `model.compute_baseline_hazards()`. I basically fed the image as input and tuples of time and event values as target then it throw another error saying that to much to unpack. Could you please help me to understand how can I solve this issue? I really appreciate any help. Regards, Asli Y.
havakv commented 3 years ago

Hi! The Cox models (CoxPH, CoxCC and CoxTime) doesn't really work like the other model implemented here as there are some additional details needed for the non-parametric baseline hazards (which is estimated with model.compute_baseline_hazards).

I think just unpacking the dataset should do the trick for you here, so for a dataset train_data of size (torch.Size([16, 1, 128, 128]), (torch.Size([16]), torch.Size([16]))), you should just need to run model.compute_baseline_hazards(*train_data).

If you want to build the bazeling hazards based on the full training set, you can concatenate all the batches in your dataloader dl_train with the following code

train_data = tt.tuplefy([data for data in dl_train]).cat()
havakv commented 3 years ago

I guess there should be a way to estimate the baseline hazards by using the dataloaders directly, is this way is very unintuitive. Thanks for raising the issue!

aslitoj commented 3 years ago

Hello @havakv, Thank you for your reply. Your suggestion above solved the issue. I am glad that I am able to contribute. Lastly, thanks a lot for this great work!

havakv commented 3 years ago

Happy to help, and thank you for the kind words! I'll just let this issue stay open for a while to remind me that this is something that really should be improved in the future.

up-tree commented 2 years ago

Hi, I met the same problem in Precidition with the CoxPH model and GNN net. The input of my data is Graph, but the target is tensor. When I ran the code model.compute_baseline_hazards(), it got the following error:

ValueError: All objects in 'data' doest have the same type.

Could you please help me to fix this problem?