plbenveniste / lung-treatment-response

Machine learning model for lung treatment response
MIT License
0 stars 0 forks source link

Model for prediction of distant relapse #5

Open plbenveniste opened 2 weeks ago

plbenveniste commented 2 weeks ago

In this issue, we detail the work done to build a model able to predict the occurrence of a distant relapse (meaning non-local).

For each participant, the values were averaged across nodules. We consider to have a distant relapse if any of these relapses occurred:

Number of subjects which had a distant relapse: 67 Number of subject for training: 130 Number of subject for testing: 33

Here are the performances of the 3 models trained:

  1. Model without any occurrence deadline: Model performance:

    • ROC AUC Score: 0.726923076923077
    • Brier score: 0.19839296981051371
    • Average precision: 0.7777777777777778
    • Average Recall: 0.5384615384615384
    • Accuracy Score: 0.7575757575757576
    • AUC-PR score: 0.749028749028749
  2. Model for prediction of distant relapse within 1 year: Number of subjects that had a distant relapse within 1 year: 23 Number of subjects that had a distant relapse within 1 year (train): 21 Number of subjects that had a distant relapse within 1 year (test): 2 Model performance on the deadline of 1 year

    • ROC AUC Score: 0.11290322580645162
    • Brier score: 0.1145299780607971
    • Average precision: 0.0
    • Average Recall: 0.0
    • Accuracy Score: 0.8787878787878788
    • AUC-PR score: 0.030303030303030304
  3. Model for prediction of distant relapse within 3 years: Number of subjects that had a distant relapse within 3 year (train): 48 Number of subjects that had a distant relapse within 3 year (test): 13 Model performance on the deadline of 3 year

    • ROC AUC Score: 0.6269230769230769
    • Brier score: 0.2566913466094455
    • Average precision: 0.5454545454545454
    • Average Recall: 0.46153846153846156
    • Accuracy Score: 0.6363636363636364
    • AUC-PR score: 0.6095571095571095

Hyper-parameter wasn't done since it proved to lower performance as explained in this comment.

plbenveniste commented 2 weeks ago

Same as in this comment, I added train/test splitting across sites, removal of features and hyperparameter finetuning.

Here is the model output:

Feature data shape: (163, 139)
Target data shape: (163, 2)
Number of subjects which had a distant relapse: 67

Number of subject for training: 136
Number of subject for testing: 27

Model performance without any occurence deadline
ROC AUC Score:  0.6172839506172839
Brier score: 0.2704013019646528
Average precision: 0.5
Average Recall: 0.5555555555555556
Accuracy Score:  0.6666666666666666
AUC-PR score: 0.6018518518518519

Number of subjects that had a distant relapse within 1 year: 23
Number of subjects that had a distant relapse within 1 year (train): 18
Number of subjects that had a distant relapse within 1 year (test): 5
Model performance on the deadline of 1 year
ROC AUC Score: 0.42727272727272725
Brier score: 0.18168762483647793
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.7777777777777778
AUC-PR score: 0.09259259259259259

Number of subjects that had a distant relapse within 3 year: 61

Number of subjects that had a distant relapse within 3 year (train): 52
Number of subjects that had a distant relapse within 3 year (test): 9
Model performance on the deadline of 3 year
ROC AUC Score: 0.6851851851851851
Brier score: 0.26541833623235794
Average precision: 0.4166666666666667
Average Recall: 0.5555555555555556
Accuracy Score: 0.5925925925925926
AUC-PR score: 0.5601851851851852

Initial number of features: 28
Number of subject for training: 136
Number of subject for testing: 27

Model performance without any occurence deadline
ROC AUC Score:  0.47530864197530864
Brier score: 0.3582369294948105
Average precision: 0.36363636363636365
Average Recall: 0.4444444444444444
Accuracy Score:  0.5555555555555556
AUC-PR score: 0.49663299663299665

Number of features after variance thresholding: 25
Number of features removed by variance thresholding: 3

Model performance after variance thresholding
ROC AUC Score:  0.5802469135802469
Brier score: 0.3231744471903118
Average precision: 0.4
Average Recall: 0.4444444444444444
Accuracy Score:  0.5925925925925926
AUC-PR score: 0.5148148148148148

Number of features after correlation thresholding: 23
Number of features removed by correlation thresholding: 2

Model performance after feature selection based on correlation
ROC AUC Score:  0.617283950617284
Brier score: 0.32686233103625645
Average precision: 0.42857142857142855
Average Recall: 0.6666666666666666
Accuracy Score:  0.5925925925925926
AUC-PR score: 0.6031746031746031

Number of features after correlation with target thresholding: 13
Number of features removed by correlation with target thresholding: 10

Final features of the model:
['age', 'BMI', 'OMS', 'tabac_sevre', 'histo', 'T', 'etalement', 'MORPHOLOGICAL_Compacity', 'INTENSITY-BASED_IntensityInterquartileRange', 'INTENSITY-BASED_AreaUnderCurveCIVH', 'GLCM_ClusterProminence', 'GLRLM_RunLengthNonUniformity', 'NGTDM_Contrast']

Model performance after feature selection based on correlation with target
ROC AUC Score:  0.5802469135802469
Brier score: 0.3452606850303094
Average precision: 0.4
Average Recall: 0.4444444444444444
Accuracy Score:  0.5925925925925926
AUC-PR score: 0.5148148148148148

Model performance after hyperparameter tuning
ROC AUC Score:  0.691358024691358
Brier score: 0.30101305793088085
Average precision: 0.5
Average Recall: 0.7777777777777778
Accuracy Score:  0.6666666666666666
AUC-PR score: 0.6759259259259259