plbenveniste / lung-treatment-response

Machine learning model for lung treatment response
MIT License
0 stars 0 forks source link

Study of impact of relapse on survival #7

Open plbenveniste opened 2 weeks ago

plbenveniste commented 2 weeks ago

In this issue, I detail the investigation into the impact of relapse in survival rates. This was done in the code model_training/primitive_vs_metastasis_model.py with the following command:

python -W ignore model_training/primitive_vs_metastasis_model.py --input ../data/merged_data.csv --output ../output2 

The code investigates the following:

Here are the outputs of the model:

Total number of patients 163
Number of primitive patients 85
Number of metastasis patients 78

 ------------- Model for prediction of survival for primitive patients -------------
Feature data shape: (85, 145)
Target data shape: (85, 2)
Number of subjects which died: 16

Number of subject for training: 69
Number of subject for testing: 16

Number of subjects that died within 3 year: 12

Number of subjects that died within 3 year (train): 10
Number of subjects that died within 3 year (test): 2

Model performance on the deadline of 3 year with 145 features
ROC AUC Score: 0.8571428571428571
Brier score: 0.11149794478131589
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.875
AUC-PR score: 0.5625

 --- Feature selection ---
Initial number of features: 28
Number of subject for training: 69
Number of subject for testing: 16

Model performance with 3 year deadline with 28 features
ROC AUC Score:  0.9333333333333333
Brier score: 0.043331586264660604
Average precision: 0.9375
Average Recall: 1.0
Accuracy Score:  0.9375
AUC-PR score: 0.96875

Number of features after variance thresholding: 22
Number of features removed by variance thresholding: 6

Model performance after variance thresholding with 3 year deadline and 22 features
ROC AUC Score:  0.9333333333333333
Brier score: 0.043331586264660604
Average precision: 0.9375
Average Recall: 1.0
Accuracy Score:  0.9375
AUC-PR score: 0.96875

Number of features after correlation thresholding: 18
Number of features removed by correlation thresholding: 4

Model performance after feature selection based on correlation (3-year deadline) with 18 features
ROC AUC Score:  1.0
Brier score: 0.0306842485044323
Average precision: 0.9375
Average Recall: 1.0
Accuracy Score:  0.9375
AUC-PR score: 0.96875

Number of features after correlation with target thresholding: 15
Number of features removed by correlation with target thresholding: 3

Final features of the model:
['age', 'BMI', 'tabac_PA', 'histo', 'dose_tot', 'etalement', 'couv_PTV', 'BED_10', 'INTENSITY-BASED_IntensitySkewness', 'INTENSITY-BASED_IntensityKurtosis', 'INTENSITY-BASED_AreaUnderCurveCIVH', 'INTENSITY-HISTOGRAM_IntensityHistogramMean', 'INTENSITY-HISTOGRAM_IntensityHistogramVariance', 'NGTDM_Complexity', 'NGTDM_Strength']

Model performance after feature selection based on correlation with target (3-year deadline) with 15 features
ROC AUC Score:  0.9333333333333333
Brier score: 0.0563129837679206
Average precision: 0.9375
Average Recall: 1.0
Accuracy Score:  0.9375
AUC-PR score: 0.96875
Confusion matrix:
TN: 0
FP: 1
FN: 0
TP: 15

 ------------- Model for prediction of survival for metastasis patients -------------
Feature data shape: (78, 145)
Target data shape: (78, 2)
Number of subjects which died: 31

Number of subject for training: 67
Number of subject for testing: 11

Number of subjects that died within 3 year: 21

Number of subjects that died within 3 year (train): 17
Number of subjects that died within 3 year (test): 4

Model performance on the deadline of 3 year with 145 features
ROC AUC Score: 0.8571428571428572
Brier score: 0.12062535732896305
Average precision: 1.0
Average Recall: 0.5
Accuracy Score: 0.8181818181818182
AUC-PR score: 0.8409090909090909

 --- Feature selection ---
Initial number of features: 28
Number of subject for training: 67
Number of subject for testing: 11

Model performance with 3 year deadline with 28 features
ROC AUC Score:  0.5
Brier score: 0.15394804821923735
Average precision: 0.9
Average Recall: 0.9
Accuracy Score:  0.8181818181818182
AUC-PR score: 0.9454545454545454

Number of features after variance thresholding: 24
Number of features removed by variance thresholding: 4

Model performance after variance thresholding with 3 year deadline and 24 features
ROC AUC Score:  0.5
Brier score: 0.15394804821923735
Average precision: 0.9
Average Recall: 0.9
Accuracy Score:  0.8181818181818182
AUC-PR score: 0.9454545454545454

Number of features after correlation thresholding: 19
Number of features removed by correlation thresholding: 5

Model performance after feature selection based on correlation (3-year deadline) with 19 features
ROC AUC Score:  0.5
Brier score: 0.09713635330326395
Average precision: 0.9090909090909091
Average Recall: 1.0
Accuracy Score:  0.9090909090909091
AUC-PR score: 0.9545454545454546

Number of features after correlation with target thresholding: 16
Number of features removed by correlation with target thresholding: 3

Final features of the model:
['BMI', 'score_charlson', 'OMS', 'histo', 'T', 'dose_tot', 'etalement', 'vol_GTV', 'couv_PTV', 'INTENSITY-BASED_MeanIntensity', 'INTENSITY-BASED_IntensitySkewness', 'INTENSITY-BASED_IntensityKurtosis', 'INTENSITY-BASED_AreaUnderCurveCIVH', 'INTENSITY-HISTOGRAM_IntensityHistogramMean', 'INTENSITY-HISTOGRAM_IntensityHistogramVariance', 'NGTDM_Strength']

Model performance after feature selection based on correlation with target (3-year deadline) with 16 features
ROC AUC Score:  0.4
Brier score: 0.11719389164990067
Average precision: 0.9
Average Recall: 0.9
Accuracy Score:  0.8181818181818182
AUC-PR score: 0.9454545454545454
Confusion matrix:
TN: 0
FP: 1
FN: 1
TP: 9

 ------------- Model for prediction of survival for all patients -------------
Feature data shape: (163, 146)
Target data shape: (163, 2)
Number of subjects which died: 47

Number of subject for training: 136
Number of subject for testing: 27

Number of subjects that died within 3 year: 33

Number of subjects that died within 3 year (train): 27
Number of subjects that died within 3 year (test): 6

Model performance on the deadline of 3 year with 146 features
ROC AUC Score: 0.873015873015873
Brier score: 0.135983463216857
Average precision: 0.6666666666666666
Average Recall: 0.3333333333333333
Accuracy Score: 0.8148148148148148
AUC-PR score: 0.5740740740740741

 --- Feature selection ---
Initial number of features: 29
Number of subject for training: 136
Number of subject for testing: 27

Model performance with 3 year deadline with 29 features
ROC AUC Score:  0.54
Brier score: 0.07281945477152565
Average precision: 0.9259259259259259
Average Recall: 1.0
Accuracy Score:  0.9259259259259259
AUC-PR score: 0.962962962962963

Number of features after variance thresholding: 22
Number of features removed by variance thresholding: 7

Model performance after variance thresholding with 3 year deadline and 22 features
ROC AUC Score:  0.66
Brier score: 0.0722955884260542
Average precision: 0.9259259259259259
Average Recall: 1.0
Accuracy Score:  0.9259259259259259
AUC-PR score: 0.962962962962963

Number of features after correlation thresholding: 17
Number of features removed by correlation thresholding: 5

Model performance after feature selection based on correlation (3-year deadline) with 17 features
ROC AUC Score:  0.72
Brier score: 0.06265992975503183
Average precision: 0.9259259259259259
Average Recall: 1.0
Accuracy Score:  0.9259259259259259
AUC-PR score: 0.962962962962963

Number of features after correlation with target thresholding: 15
Number of features removed by correlation with target thresholding: 2

Final features of the model:
['BMI', 'score_charlson', 'tabac_PA', 'dose_tot', 'etalement', 'vol_GTV', 'couv_PTV', 'BED_10', 'INTENSITY-BASED_MeanIntensity', 'INTENSITY-BASED_IntensitySkewness', 'INTENSITY-BASED_IntensityKurtosis', 'INTENSITY-BASED_AreaUnderCurveCIVH', 'INTENSITY-HISTOGRAM_IntensityHistogramMean', 'INTENSITY-HISTOGRAM_IntensityHistogramVariance', 'NGTDM_Strength']

Model performance after feature selection based on correlation with target (3-year deadline) with 15 features
ROC AUC Score:  0.72
Brier score: 0.0677152051168577
Average precision: 0.9259259259259259
Average Recall: 1.0
Accuracy Score:  0.9259259259259259
AUC-PR score: 0.962962962962963
Confusion matrix:
TN: 0
FP: 2
FN: 0
TP: 25

Here is my opinion on the question:

plbenveniste commented 3 days ago