havakv / pycox

Survival analysis with PyTorch
BSD 2-Clause "Simplified" License
801 stars 188 forks source link

DL-based survival analysis for high dimensional data #14

Closed scikitting closed 4 years ago

scikitting commented 4 years ago

Hi, havakv, I am a radiologist, interesting in your work in DL based survival analysis. Recently, I used the pycox for processing medical survival data. I have some questions to you:

  1. When i used N-MTLR for processing my data, i found it doses not work well. My data has 2800+ input variables, while with 243 samples. So, i first used a AE to reduce variables to 70, thus use N-MTLR for training, in this case, it can perform well. So i ask you when the dimension of data is significantly higher than no. of samples, does N-MTLR still work well? Do you have some experience? or can N-MTLR network can be instead by a AE network, for both dimension reduction and survival analysis?
  2. In N-MTLR model, due to its non-PH model, how to know the individual weight of each input variable for developed function? Can N-MTLR outputs a samilar hazard ratio to know which one is important predictor of survival?
  3. I am a doctor, so i hope you can help me, how to save the predicted probabilities ( point-sample to-point probability) to a csv. file?

hope receive you response

if possible, you can response to: njmu_zyd@163.com

havakv commented 4 years ago

As this was not an issue with the code, but instead a specific question of modeling, it was answered through email. We had a nice conversation. A summary of our conversation:

  1. One cannot really expect neural networks to perform well with that many input variables and that few samples. So some reduction of the feature space is necessary and an AE might be a reasonable approach. For the record, the input is not an image. One can create an architecture that combines the AE with the N-MTLR by letting the network have two outputs and use a loss function that is a weighted sum of the AE loss and the survival loss. I'm planning to give an example of this in the future, as it nicely illustrates how one can extend the implemented models.

  2. This is a reference to the interpretation of the estimated parameters of a Cox proportional hazards regression with a linear risk function. There is not an equivalent interpretation of the parameters of a neural network.

  3. The survival estimates are typically a pandas DataFrame, at least if they are obtained with surv = model.predict_surv_df(x), meaning they can be stored to a csv file with for example surv.to_csv('myfile.csv').

scikitting commented 4 years ago

thank you for the response

--

Yu-Dong Zhang, M.D.

Department of Radiology, the First Affiliated Hospital with Nanjing Medical University

Nanjing, China, 210009 E-mail: njmu_zyd@163.com

At 2019-11-28 01:47:26, "Haavard Kvamme" notifications@github.com wrote:

As this was not an issue with the code, but instead a specific question of modeling, it was answered through email. We had a nice conversation. A summary of our conversation:

One cannot really expect neural networks to perform well with that many input variables and that few samples. So some reduction of the feature space is necessary and an AE might be a reasonable approach. For the record, the input is not an image. One can create an architecture that combines the AE with the N-MTLR by letting the network have two outputs and use a loss function that is a weighted sum of the AE loss and the survival loss. I'm planning to give an example of this in the future, as it nicely illustrates how one can extend the implemented models.

This is a reference to the interpretation of the estimated parameters of a Cox proportional hazards regression with a linear risk function. There is not an equivalent interpretation of the parameters of a neural network.

The survival estimates are typically a pandas DataFrame, at least if they are obtained with surv = model.predict_surv_df(x), meaning they can be stored to a csv file with for example surv.to_csv('myfile.csv').

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

havakv commented 4 years ago

For completeness, there is now an example of how one can combine an autoencoder with a survival model. The example use the LogisticHazard rather than MTLR, it is straight forward to use MTLR instead. The example can be found at 03_network_architectures.ipynb

scikitting commented 4 years ago

that is great, i try it latter发自我的华为手机-------- 原始邮件 --------发件人: Haavard Kvamme notifications@github.com日期: 2019年12月19日周四 傍晚5:14收件人: havakv/pycox pycox@noreply.github.com抄送: scikitting njmu_zyd@163.com, Author author@noreply.github.com主 题: Re: [havakv/pycox] DL-based survival analysis for high dimensional data (#14)For completeness, there is now an example of how one can combine an autoencoder with a survival model. The example use the LogisticHazard rather than MTLR, it is straight forward to use MTLR instead. The example can be found at 03_network_architectures.ipynb

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.