Benchmark on spinal cord segmentation

uzaymacar commented 2 years ago

Compare sct_deepseg_sc and your SC segmentation model.

To-do:

[x] Open PR with a script performing the analysis below.
[x] Decide on a evaluation / benchmarking plan.
[x] Visualization of SC predictions (auto-segmentations from newly trained ivadomed model and from sct_deepseg.

uzaymacar commented 2 years ago

In this project, we have access to 3 segmentations for SC: (i) auto-segmentations via sct_deepseg_sc, (ii) manually corrected SC segmentations, and (iii) auto-segmentations via the SC segmentation model in this repository.

The gameplan for benhmarking is then to compare (i) and (iii) againts (ii), i.e. taking the manually corrected SC segmentations as the gold label.

uzaymacar commented 2 years ago

This should also of course be conducted exclusively in the test set.

uzaymacar commented 2 years ago

The config file for SC segmentation with the random_seed key of 42 yields the following six test subjects for the basel-mp2rage dataset:

sub-P007, sub-P010, sub-P013, sub-P017, sub-P024, sub-P025

By checking the accompanying .json files for the annotations of each of these subjects after the preprocessing step as shown below

Terminal output

```console uzmac@romane:~/model_seg_ms_mp2rage/basel-mp2rage-preprocessed/data_processed_scseg/derivatives/labels$ cat sub-P007/anat/sub-P007_UNIT1_seg-manual.json { "Author": "Julien Cohen-Adad", "Date": "2021-12-10 12:56:40" }uzmac@romane:~/model_seg_ms_mp2rage/basel-mp2rage-preprocessed/data_processed_scseg/derivatives/labels$ cat sub-P010/anat/sub-P010_UNIT1_seg-manual.json { "Author": "Generated with sct_deepseg_sc", "Date": "2021-12-18 07:53:14" } uzmac@romane:~/model_seg_ms_mp2rage/basel-mp2rage-preprocessed/data_processed_scseg/derivatives/labels$ cat sub-P013/anat/sub-P013_UNIT1_seg-manual.json { "Author": "Julien Cohen-Adad", "Date": "2021-12-10 13:07:24" }uzmac@romane:~/model_seg_ms_mp2rage/basel-mp2rage-preprocessed/data_processed_scseg/derivatives/labels$ cat sub-P017/anat/sub-P017_UNIT1_seg-manual.json { "Author": "Julien Cohen-Adad", "Date": "2021-12-13 11:09:05" }uzmac@romane:~/model_seg_ms_mp2rage/basel-mp2rage-preprocessed/data_processed_scseg/derivatives/labels$ cat sub-P024/anat/sub-P024_UNIT1_seg-manual.json { "Author": "Julien Cohen-Adad", "Date": "2021-12-10 13:04:28" }uzmac@romane:~/model_seg_ms_mp2rage/basel-mp2rage-preprocessed/data_processed_scseg/derivatives/labels$ cat sub-P025/anat/sub-P025_UNIT1_seg-manual.json { "Author": "Generated with sct_deepseg_sc", "Date": "2021-12-18 07:50:27" } ```

we see that whereas annotations for the test subjects sub-P010 and sub-P025 are generated directly by sct_deepseg_sc without any manual corrections, the rest of the test subjects (i.e. sub-P007, sub-P013, sub-P017, and sub-P024) are manually corrected. Therefore, we will conduct the benchmark on the latter four test subjects.

uzaymacar commented 2 years ago

Preliminary results from this analysis:

Subject:  sub-P013
Model Dice Score:  0.9512
SCT Dice Score:  0.9847
-------------------------------
Subject:  sub-P024
Model Dice Score:  0.9511
SCT Dice Score:  0.9764
-------------------------------
Subject:  sub-P007
Model Dice Score:  0.9551
SCT Dice Score:  0.9699
-------------------------------
Subject:  sub-P017
Model Dice Score:  0.9492
SCT Dice Score:  0.9917
-------------------------------

This indicates that sct_deepseg_sc model trained on contrast t1 ~~achieves better performance~~ higher Dice score compared to our ivadomed model trained on mp2rage contrast. Perhaps this is not surprising as the GT we use to measure this performance are "built upon" (via manual correction) on sct_deepseg_sc predictions. As these manual corrections were small compared to the large volume of the spinal cord, the SCT model achieves near perfect scores. This seems like a biased evaluation and will be discussed in the upcoming meeting.

uzaymacar commented 2 years ago

The linked PR adresses all three tasks mentioned in this issue.

ivadomed / model_seg_ms_mp2rage

Benchmark on spinal cord segmentation #28