havakv / pycox

Survival analysis with PyTorch
BSD 2-Clause "Simplified" License
821 stars 191 forks source link

same survival probabilities from predict_surv_df #137

Open mahootiha-maryam opened 2 years ago

mahootiha-maryam commented 2 years ago

Hi Havard. I am facing a problem for a long time that I can not understand what could be the reason of problem. I used images as inputs to a net that is a kind of Encoder and connect it to fully connected layers. I get good training loss but when it comes to predict_surv_df I can understand for different patients I get same survival probabilities. It is really strange for me. For example I trained a model that I got 0.45 train loss but I got same survival times and same curves for different patients but when I saw your example I could understand you get very good c-index and curves with 1.6 train loss. I have to add before training model completely and for example just with 2 epochs I get different probabilities and different survival times. This problem made me crazy and I can not understand what is the problem.

havakv commented 2 years ago

So, a good training loss might just be the model overfitting to the training data. That's why you should probably monitor a validation loss when fitting the model. When the validation loss stops improving, you're only overfitting when you continue to improve the training loss.

If all your test samples get the same predicted survival I would first advice you to check if this is also the case for the training samples. And if I remember correctly from previous discussions, you have a very small dataset (~50 samples), so expecting your model to do anything else than overfitting is imho optimistic.

As a sanity check, to make sure it's not just a bug in your code, you should try to predict the survival curves for your test set before you fit the model. If you still get identical predictions, it's most likely a but in your test data loader, or it could be that you have initialized your net with parameters that are not good for the scale of your images

mahootiha-maryam commented 2 years ago

There is one strange thing, that when I had validation dataset I got good c-index when validation loss was high and I got bad c-indexes and same probabilities when validation loss was low. My problem is that I accept over fitting happens for my dataset but why when I want to predict survival times with train data loader it happens again I get same survival times. so I get 0 in c-index. Now I am trying to use a public dataset that has 200 samples. You mean I stop the model learning for example in epoch 1 and then see the curves? Because I did it and I got different curves. But after training model for multiple epochs getting same survival times happens for training data loader and test data loader.