Open plbenveniste opened 3 weeks ago
The following work was done:
Here is the output of the code:
Feature data shape: (181, 139)
Target data shape: (181, 2)
Number of subjects which had a local relapse 24
Number of subject for training: 147
Number of subject for testing 34
Model performance without any occurence deadline
ROC AUC Score: 0.3125
Brier score 0.07385259470829313
Average precision 0.0
Average Recall 0.0
Accuracy Score: 0.9411764705882353
AUC-PR score 0.5294117647058824
Initial number of features: 28
Number of subject for training: 147
Number of subject for testing: 34
Model performance without any occurence deadline
ROC AUC Score: 0.4375
Brier score: 0.0934391523936288
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.8823529411764706
AUC-PR score: 0.029411764705882353
Number of features after variance thresholding: 27
Number of features removed by variance thresholding: 1
Model performance after variance thresholding
ROC AUC Score: 0.4375
Brier score: 0.09431698444230205
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.8529411764705882
AUC-PR score: 0.029411764705882353
Number of features after correlation thresholding: 22
Number of features removed by correlation thresholding: 5
Model performance after feature selection based on correlation
ROC AUC Score: 0.484375
Brier score: 0.06538062376412883
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.9117647058823529
AUC-PR score: 0.029411764705882353
Number of features after correlation with target thresholding: 13
Number of features removed by correlation with target thresholding: 9
Final features of the model:
['sexe', 'BMI', 'score_charlson', 'T', 'vol_GTV', 'couv_PTV', 'INTENSITY-BASED_StandardDeviation', 'INTENSITY-BASED_MaximumIntensity', 'INTENSITY-BASED_IntensityInterquartileRange', 'INTENSITY-BASED_IntensityBasedEnergy', 'INTENSITY-BASED_TotalLesionGlycolysis', 'GLCM_DifferenceAverage', 'GLCM_DifferenceVariance']
Model performance after feature selection based on correlation with target
ROC AUC Score: 0.4375
Brier score: 0.07286599061330747
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.9117647058823529
AUC-PR score: 0.029411764705882353
Model performance after hyperparameter tuning
ROC AUC Score: 0.5
Brier score: 0.06383743738731905
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.9411764705882353
AUC-PR score: 0.5294117647058824
Here is the output of the model after using another site for testing (site L):
Feature data shape: (181, 139)
Target data shape: (181, 2)
Number of subjects which had a local relapse 24
Number of subject for training: 159
Number of subject for testing: 22
Model performance without any occurence deadline
ROC AUC Score: 0.3958333333333333
Brier score: 0.2515360709409033
Average precision 0.0
Average Recall 0.0
Accuracy Score: 0.7272727272727273
AUC-PR score 0.6363636363636364
Initial number of features: 28
Number of subject for training: 159
Number of subject for testing: 22
Model performance without any occurence deadline
ROC AUC Score: 0.5416666666666666
Brier score: 0.2466955498438941
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.7272727272727273
AUC-PR score: 0.6363636363636364
Number of features after variance thresholding: 27
Number of features removed by variance thresholding: 1
Model performance after variance thresholding
ROC AUC Score: 0.48958333333333326
Brier score: 0.24611133471394445
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.7272727272727273
AUC-PR score: 0.6363636363636364
Number of features after correlation thresholding: 22
Number of features removed by correlation thresholding: 5
Model performance after feature selection based on correlation
ROC AUC Score: 0.59375
Brier score: 0.26699780790407684
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.7272727272727273
AUC-PR score: 0.6363636363636364
Number of features after correlation with target thresholding: 18
Number of features removed by correlation with target thresholding: 4
Final features of the model:
['sexe', 'age', 'BMI', 'score_charlson', 'tabac_PA', 'tabac_sevre', 'histo', 'centrale', 'etalement', 'vol_GTV', 'couv_PTV', 'BED_10', 'INTENSITY-BASED_StandardDeviation', 'INTENSITY-BASED_MaximumIntensity', 'INTENSITY-BASED_IntensityBasedEnergy', 'INTENSITY-BASED_TotalLesionGlycolysis', 'GLCM_DifferenceAverage', 'GLCM_DifferenceVariance']
Model performance after feature selection based on correlation with target
ROC AUC Score: 0.46875
Brier score: 0.2657333832738249
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.7272727272727273
AUC-PR score: 0.6363636363636364
Confusion matrix:
TN: 16
FP: 0
FN: 6
TP: 0
Model performance after hyperparameter tuning
ROC AUC Score: 0.5833333333333333
Brier score: 0.24325574137117692
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.7272727272727273
AUC-PR score: 0.6363636363636364
Confusion matrix:
TN: 16
FP: 0
FN: 6
TP: 0
Number of subjects that had a local relapse within 1 year: 10
Number of subjects that had a local relapse within 1 year (train): 6
Number of subjects that had a local relapse within 1 year (test): 4
Model performance with fewer features with 1 year deadline
ROC AUC Score: 0.625
Brier score: 0.1778249120460954
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.8181818181818182
AUC-PR score: 0.5909090909090909
Confusion matrix:
TN: 18
FP: 0
FN: 4
TP: 0
Model performance with the hyperparameter found with Bayesian Optimisation (fewer features with 1 year deadline)
ROC AUC Score: 0.6527777777777777
Brier score: 0.17424474223466804
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.8181818181818182
AUC-PR score: 0.5909090909090909
Confusion matrix:
TN: 18
FP: 0
FN: 4
TP: 0
My opinion would be to not use these models which miss every positive case in every case.
This issue describes the steps to train a model for predicting a local relapse after treatment (i.e. 'rechute_PTV'). This is done for every nodule (not for every subject).
The code used can be found in file
model_training/local_relapse_model.py
.Here are the outputs of the training:
Here are the model performance without time limit for the relapse:
These poor performances can be explained by the fact that very few nodules had local relapse.
Furthermore, because already so few subject had a local relapse, I think that filtering per year wouldn't make sens as it would degrade the performances. However, we see that the very vast majority have had a local relapse before 3 years. The following figures describe the distribution of the local relapse date.
Next step: