Model for prediction of local relapse

plbenveniste commented 3 weeks ago

This issue describes the steps to train a model for predicting a local relapse after treatment (i.e. 'rechute_PTV'). This is done for every nodule (not for every subject).

The code used can be found in file model_training/local_relapse_model.py.

Here are the outputs of the training:

Feature data shape: (181, 138)
Target data shape: (181, 1)
Number of subjects which had a local relapse 24
Number of subject for training: 144
Number of subject for testing: 37

Here are the model performance without time limit for the relapse:

ROC AUC Score: 0.43939393939393934
Brier score: 0.11332947936429884
Average precision 0.0
Average Recall 0.0
Accuracy Score: 0.8918918918918919
AUC-PR score 0.5540540540540541

These poor performances can be explained by the fact that very few nodules had local relapse.

Furthermore, because already so few subject had a local relapse, I think that filtering per year wouldn't make sens as it would degrade the performances. However, we see that the very vast majority have had a local relapse before 3 years. The following figures describe the distribution of the local relapse date. Figure_1 Figure_2

Next step:

[x] Add hyperparameter fine-tuning

plbenveniste commented 2 weeks ago

The following work was done:

train/test splitting based on subject site
removal of features
hyperparameter optimisation

Here is the output of the code:

Feature data shape: (181, 139)
Target data shape: (181, 2)
Number of subjects which had a local relapse 24

Number of subject for training: 147
Number of subject for testing  34

Model performance without any occurence deadline
ROC AUC Score:  0.3125
Brier score  0.07385259470829313
Average precision 0.0
Average Recall 0.0
Accuracy Score:  0.9411764705882353
AUC-PR score   0.5294117647058824

Initial number of features: 28
Number of subject for training: 147
Number of subject for testing: 34

Model performance without any occurence deadline
ROC AUC Score:  0.4375
Brier score: 0.0934391523936288
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.8823529411764706
AUC-PR score: 0.029411764705882353

Number of features after variance thresholding: 27
Number of features removed by variance thresholding: 1

Model performance after variance thresholding
ROC AUC Score:  0.4375
Brier score: 0.09431698444230205
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.8529411764705882
AUC-PR score: 0.029411764705882353

Number of features after correlation thresholding: 22
Number of features removed by correlation thresholding: 5

Model performance after feature selection based on correlation
ROC AUC Score:  0.484375
Brier score: 0.06538062376412883
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.9117647058823529
AUC-PR score: 0.029411764705882353

Number of features after correlation with target thresholding: 13
Number of features removed by correlation with target thresholding: 9

Final features of the model:
['sexe', 'BMI', 'score_charlson', 'T', 'vol_GTV', 'couv_PTV', 'INTENSITY-BASED_StandardDeviation', 'INTENSITY-BASED_MaximumIntensity', 'INTENSITY-BASED_IntensityInterquartileRange', 'INTENSITY-BASED_IntensityBasedEnergy', 'INTENSITY-BASED_TotalLesionGlycolysis', 'GLCM_DifferenceAverage', 'GLCM_DifferenceVariance']

Model performance after feature selection based on correlation with target
ROC AUC Score:  0.4375
Brier score: 0.07286599061330747
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.9117647058823529
AUC-PR score: 0.029411764705882353

Model performance after hyperparameter tuning
ROC AUC Score:  0.5
Brier score: 0.06383743738731905
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.9411764705882353
AUC-PR score: 0.5294117647058824

plbenveniste commented 2 weeks ago

Here is the output of the model after using another site for testing (site L):

Feature data shape: (181, 139)
Target data shape: (181, 2)
Number of subjects which had a local relapse 24

Number of subject for training: 159
Number of subject for testing: 22

Model performance without any occurence deadline
ROC AUC Score:  0.3958333333333333
Brier score: 0.2515360709409033
Average precision 0.0
Average Recall 0.0
Accuracy Score:  0.7272727272727273
AUC-PR score   0.6363636363636364

Initial number of features: 28
Number of subject for training: 159
Number of subject for testing: 22

Model performance without any occurence deadline
ROC AUC Score:  0.5416666666666666
Brier score: 0.2466955498438941
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.7272727272727273
AUC-PR score: 0.6363636363636364

Number of features after variance thresholding: 27
Number of features removed by variance thresholding: 1

Model performance after variance thresholding
ROC AUC Score:  0.48958333333333326
Brier score: 0.24611133471394445
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.7272727272727273
AUC-PR score: 0.6363636363636364

Number of features after correlation thresholding: 22
Number of features removed by correlation thresholding: 5

Model performance after feature selection based on correlation
ROC AUC Score:  0.59375
Brier score: 0.26699780790407684
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.7272727272727273
AUC-PR score: 0.6363636363636364

Number of features after correlation with target thresholding: 18
Number of features removed by correlation with target thresholding: 4

Final features of the model:
['sexe', 'age', 'BMI', 'score_charlson', 'tabac_PA', 'tabac_sevre', 'histo', 'centrale', 'etalement', 'vol_GTV', 'couv_PTV', 'BED_10', 'INTENSITY-BASED_StandardDeviation', 'INTENSITY-BASED_MaximumIntensity', 'INTENSITY-BASED_IntensityBasedEnergy', 'INTENSITY-BASED_TotalLesionGlycolysis', 'GLCM_DifferenceAverage', 'GLCM_DifferenceVariance']

Model performance after feature selection based on correlation with target
ROC AUC Score:  0.46875
Brier score: 0.2657333832738249
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.7272727272727273
AUC-PR score: 0.6363636363636364
Confusion matrix:
TN: 16
FP: 0
FN: 6
TP: 0

Model performance after hyperparameter tuning
ROC AUC Score:  0.5833333333333333
Brier score: 0.24325574137117692
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.7272727272727273
AUC-PR score: 0.6363636363636364
Confusion matrix:
TN: 16
FP: 0
FN: 6
TP: 0

Number of subjects that had a local relapse within 1 year: 10
Number of subjects that had a local relapse within 1 year (train): 6
Number of subjects that had a local relapse within 1 year (test): 4

Model performance with fewer features with 1 year deadline
ROC AUC Score:  0.625
Brier score: 0.1778249120460954
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.8181818181818182
AUC-PR score: 0.5909090909090909
Confusion matrix:
TN: 18
FP: 0
FN: 4
TP: 0

Model performance with the hyperparameter found with Bayesian Optimisation (fewer features with 1 year deadline)
ROC AUC Score:  0.6527777777777777
Brier score: 0.17424474223466804
Average precision: 0.0
Average Recall: 0.0
Accuracy Score:  0.8181818181818182
AUC-PR score: 0.5909090909090909
Confusion matrix:
TN: 18
FP: 0
FN: 4
TP: 0

plbenveniste commented 2 weeks ago

My opinion would be to not use these models which miss every positive case in every case.

plbenveniste / lung-treatment-response

Model for prediction of local relapse #3