jaredleekatzman / DeepSurv

DeepSurv is a deep learning approach to survival analysis.
MIT License
566 stars 167 forks source link

Need advise and simple questions from a beginner #27

Closed hanxiaozhen2017 closed 6 years ago

hanxiaozhen2017 commented 6 years ago

Hi All @jaredleekatzman @vanAmsterdam @dareneiri ,

I am a beginner for using python and really hope to apply this great package. Questions below may seem stupid but I am trying to find a way out...

1) what is the valid_data for and why we need it? Commonly, I train a model using train set and test it using test data. 2) If I don't use any docker file, can I run this package? 3) In the deepsurv front page, it is mentioned that it takes only two lines to train a network, but I saw many people define their own parameters, is this necessary? In another word, without searching the best parameters, we can not get the best deepsurv model? 4) Can I find some example code somewhere showing the process that the experiments used in the paper? Or anyone would like to share some example code about how to apply this package?

Much appreciate for any reply!

Xiaozhen

dareneiri commented 6 years ago

@hanxiaozhen2017 I'll provide some answers for you, but not in the depths you're seeking since I am unable to commit the time at the moment.

To address your question about features in https://github.com/jaredleekatzman/DeepSurv/issues/28: Your features will be in the dictionary with the key 'x'. The value will be a list of the features.

For example, if you have three features ['hr','bp','temp'], and you have 5 patients, then your 'x' key would look like:

[[99,120,98],
[99,120,98],
[99,120,98],
[99,120,98],
[99,120,98]]
  1. valid_data is a separate dataset used to validate the training data (how well it learns). This is a standard approach for machine learning, and a topic which can be read about online (google splitting datasets for neural networks, for example).

  2. You do not need to use docker to use deepsurv.

  3. Technically, yes it only takes a few lines to train a network since this packages takes care of everything else. But you should really set your hyperparameters based on your dataset. So the hyperparameters will be different for everyone, and the default settings will likely not apply. Use the default settings for now until you find that a simple model works for you.

  4. I don't have any code I can share. But if you are starting off with just learning python (and data science/machine learning, from what it seems), then I kindly suggest following tutorials based off of keras+python since they're easier to follow. I don't believe they have any survival functions like DeepSurv, but it will help you get comfortable with python. There's quite a bit of data preparation that most likely needs to be done before you even start using DeepSurv (removing NaNs, imputing data, normalizing the data, etc). These steps are common practice for other machine learning methods.

hanxiaozhen2017 commented 6 years ago

Thank you very much @dareneiri for taking time answering these questions! These are really helpful! Many thanks!!!!

vanAmsterdam commented 6 years ago

I'm only starting out with Python too.

These I find helpful for deep learning: Fast.ai, free course by Jeremy Howard and Rachel Thomas Deeplearning.ai, paid course on coursera by Andrew Ng (free version available)

hanxiaozhen2017 commented 6 years ago

Many thanks @vanAmsterdam ! I will start these courses soon and maybe we can discuss here later. Already followed.

jaredleekatzman commented 6 years ago

@dareneiri Thanks for providing answers! Those are all great answers to the questions.

In addition, in response to your question 4: check out the code used to run the experiment here.

hanxiaozhen2017 commented 6 years ago

Many Thanks @jaredleekatzman !