Open plbenveniste opened 1 week ago
I created the file evaluation/test_sct_models.py
to evaluate the predictions of the 3 models for lesion seg in SCT.
It computes dice score, lesion ppv, lesion sensitivity and lesion f1 score.
It is currently running to evaluate it on th test set using:
python evaluation/test_sct_models.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_output
Because the initial code was taking too long to compute (aroung 90h), I decided to split it into 3 files:
python evaluation/test_sct_deepseg_lesion.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_lesion
python evaluation/test_sct_deepseg_psir_stir.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_psir-stir
python evaluation/test_sct_deepseg_mp2rage.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_mp2rage
I then plotted the desired curves using:
python evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_lesion/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --split test
Output:
Dice score per contrast (mean ± std)
PSIR (n=60): 0.0068 ± 0.0098
STIR (n=11): 0.3676 ± 0.2831
T2star (n=83): 0.5117 ± 0.2076
T2w (n=358): 0.3206 ± 0.2679
UNIT1 (n=57): 0.0070 ± 0.0084
python evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_mp2rage/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --split test
Output:
Dice score per contrast (mean ± std)
PSIR (n=60): 0.2135 ± 0.1760
STIR (n=11): 0.0110 ± 0.0126
T2star (n=83): 0.0074 ± 0.0223
T2w (n=358): 0.0067 ± 0.0127
UNIT1 (n=57): 0.4549 ± 0.1944
python evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_psir-stir/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --split test
Output:
Dice score per contrast (mean ± std)
PSIR (n=60): 0.5701 ± 0.2660
STIR (n=11): 0.5984 ± 0.2237
T2star (n=83): 0.1312 ± 0.1538
T2w (n=358): 0.2213 ± 0.2134
UNIT1 (n=57): 0.0023 ± 0.0016
I then evaluated the SCT models for segmenting spinal lesions on the external testing set (ms-basel-2018 and ms-basel-2020).
I rand the following command:
python evaluation/test_sct_lesion_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/
Output:
Dice score per contrast (mean ± std)
PD (n=31): 0.0046 ± 0.0114
T1w (n=22): 0.0673 ± 0.2120
T2w (n=24): 0.3272 ± 0.3372
I ran the following command:
python evaluation/test_sct_mp2rage_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/
Output:
Dice score per contrast (mean ± std)
PD (n=31): 0.0034 ± 0.0118
T1w (n=22): 0.0559 ± 0.2116
T2w (n=24): 0.2864 ± 0.4308
I ran the following command
python evaluation/test_sct_psir-stir_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/
Output:
Dice score per contrast (mean ± std)
PD (n=31): 0.0036 ± 0.0119
T1w (n=22): 0.2774 ± 0.4529
T2w (n=24): 0.2510 ± 0.3996
This issue reports the work done to evaluate the existing models.
The existing models are the following: