Model for prediction of distant relapse

In this issue, we detail the work done to build a model able to predict the occurrence of a distant relapse (meaning non-local).

For each participant, the values were averaged across nodules. We consider to have a distant relapse if any of these relapses occurred:

rechute_homo
rechute_contro
rechute_horspoum
rechute_med We computed the delay of occurrence of a distant relapse as the average of the delays of occurrence of any of the above mentioned relapses. Here is the distribution of the delays of occurrence of these relapses.

Number of subjects which had a distant relapse: 67 Number of subject for training: 130 Number of subject for testing: 33

Here are the performances of the 3 models trained:

Model without any occurrence deadline: Model performance:
- ROC AUC Score: 0.726923076923077
- Brier score: 0.19839296981051371
- Average precision: 0.7777777777777778
- Average Recall: 0.5384615384615384
- Accuracy Score: 0.7575757575757576
- AUC-PR score: 0.749028749028749
Model for prediction of distant relapse within 1 year: Number of subjects that had a distant relapse within 1 year: 23 Number of subjects that had a distant relapse within 1 year (train): 21 Number of subjects that had a distant relapse within 1 year (test): 2 Model performance on the deadline of 1 year
- ROC AUC Score: 0.11290322580645162
- Brier score: 0.1145299780607971
- Average precision: 0.0
- Average Recall: 0.0
- Accuracy Score: 0.8787878787878788
- AUC-PR score: 0.030303030303030304
Model for prediction of distant relapse within 3 years: Number of subjects that had a distant relapse within 3 year (train): 48 Number of subjects that had a distant relapse within 3 year (test): 13 Model performance on the deadline of 3 year
- ROC AUC Score: 0.6269230769230769
- Brier score: 0.2566913466094455
- Average precision: 0.5454545454545454
- Average Recall: 0.46153846153846156
- Accuracy Score: 0.6363636363636364
- AUC-PR score: 0.6095571095571095

Hyper-parameter wasn't done since it proved to lower performance as explained in this comment.

Same as in this comment, I added train/test splitting across sites, removal of features and hyperparameter finetuning.

Here is the model output:

Feature data shape: (163, 139)
Target data shape: (163, 2)
Number of subjects which had a distant relapse: 67

Number of subject for training: 136
Number of subject for testing: 27

Model performance without any occurence deadline
ROC AUC Score:  0.6172839506172839
Brier score: 0.2704013019646528
Average precision: 0.5
Average Recall: 0.5555555555555556
Accuracy Score:  0.6666666666666666
AUC-PR score: 0.6018518518518519

Number of subjects that had a distant relapse within 1 year: 23
Number of subjects that had a distant relapse within 1 year (train): 18
Number of subjects that had a distant relapse within 1 year (test): 5
Model performance on the deadline of 1 year
ROC AUC Score: 0.42727272727272725
Brier score: 0.18168762483647793
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.7777777777777778
AUC-PR score: 0.09259259259259259

Number of subjects that had a distant relapse within 3 year: 61

Number of subjects that had a distant relapse within 3 year (train): 52
Number of subjects that had a distant relapse within 3 year (test): 9
Model performance on the deadline of 3 year
ROC AUC Score: 0.6851851851851851
Brier score: 0.26541833623235794
Average precision: 0.4166666666666667
Average Recall: 0.5555555555555556
Accuracy Score: 0.5925925925925926
AUC-PR score: 0.5601851851851852

Initial number of features: 28
Number of subject for training: 136
Number of subject for testing: 27

Model performance without any occurence deadline
ROC AUC Score:  0.47530864197530864
Brier score: 0.3582369294948105
Average precision: 0.36363636363636365
Average Recall: 0.4444444444444444
Accuracy Score:  0.5555555555555556
AUC-PR score: 0.49663299663299665

Number of features after variance thresholding: 25
Number of features removed by variance thresholding: 3

Model performance after variance thresholding
ROC AUC Score:  0.5802469135802469
Brier score: 0.3231744471903118
Average precision: 0.4
Average Recall: 0.4444444444444444
Accuracy Score:  0.5925925925925926
AUC-PR score: 0.5148148148148148

Number of features after correlation thresholding: 23
Number of features removed by correlation thresholding: 2

Model performance after feature selection based on correlation
ROC AUC Score:  0.617283950617284
Brier score: 0.32686233103625645
Average precision: 0.42857142857142855
Average Recall: 0.6666666666666666
Accuracy Score:  0.5925925925925926
AUC-PR score: 0.6031746031746031

Number of features after correlation with target thresholding: 13
Number of features removed by correlation with target thresholding: 10

Final features of the model:
['age', 'BMI', 'OMS', 'tabac_sevre', 'histo', 'T', 'etalement', 'MORPHOLOGICAL_Compacity', 'INTENSITY-BASED_IntensityInterquartileRange', 'INTENSITY-BASED_AreaUnderCurveCIVH', 'GLCM_ClusterProminence', 'GLRLM_RunLengthNonUniformity', 'NGTDM_Contrast']

Model performance after feature selection based on correlation with target
ROC AUC Score:  0.5802469135802469
Brier score: 0.3452606850303094
Average precision: 0.4
Average Recall: 0.4444444444444444
Accuracy Score:  0.5925925925925926
AUC-PR score: 0.5148148148148148

Model performance after hyperparameter tuning
ROC AUC Score:  0.691358024691358
Brier score: 0.30101305793088085
Average precision: 0.5
Average Recall: 0.7777777777777778
Accuracy Score:  0.6666666666666666
AUC-PR score: 0.6759259259259259

plbenveniste / lung-treatment-response

Model for prediction of distant relapse #5