Closed uzaymacar closed 2 years ago
In this project, we have access to 3 segmentations for SC: (i) auto-segmentations via sct_deepseg_sc
, (ii) manually corrected SC segmentations, and (iii) auto-segmentations via the SC segmentation model in this repository.
The gameplan for benhmarking is then to compare (i) and (iii) againts (ii), i.e. taking the manually corrected SC segmentations as the gold label.
This should also of course be conducted exclusively in the test set.
The config file for SC segmentation with the random_seed
key of 42
yields the following six test subjects for the basel-mp2rage
dataset:
sub-P007, sub-P010, sub-P013, sub-P017, sub-P024, sub-P025
By checking the accompanying .json
files for the annotations of each of these subjects after the preprocessing step as shown below
we see that whereas annotations for the test subjects sub-P010
and sub-P025
are generated directly by sct_deepseg_sc
without any manual corrections, the rest of the test subjects (i.e. sub-P007
, sub-P013
, sub-P017
, and sub-P024
) are manually corrected. Therefore, we will conduct the benchmark on the latter four test subjects.
Preliminary results from this analysis:
Subject: sub-P013
Model Dice Score: 0.9512
SCT Dice Score: 0.9847
-------------------------------
Subject: sub-P024
Model Dice Score: 0.9511
SCT Dice Score: 0.9764
-------------------------------
Subject: sub-P007
Model Dice Score: 0.9551
SCT Dice Score: 0.9699
-------------------------------
Subject: sub-P017
Model Dice Score: 0.9492
SCT Dice Score: 0.9917
-------------------------------
This indicates that sct_deepseg_sc
model trained on contrast t1
achieves better performance higher Dice score compared to our ivadomed
model trained on mp2rage
contrast. Perhaps this is not surprising as the GT we use to measure this performance are "built upon" (via manual correction) on sct_deepseg_sc
predictions. As these manual corrections were small compared to the large volume of the spinal cord, the SCT model achieves near perfect scores. This seems like a biased evaluation and will be discussed in the upcoming meeting.
The linked PR adresses all three tasks mentioned in this issue.
Compare
sct_deepseg_sc
and your SC segmentation model.To-do:
ivadomed
model and fromsct_deepseg
.