Closed jcohenadad closed 1 year ago
Check #12 for more information on the subjects.
Further investigations by @uzaymacar shows an inter-rater average Dice score of 0.581 +/- 0.212 (+/- corresponds to the STD of the Dice across subjects). This calculation was done on this version of the dataset: 04b78bb3619b22e2e560b3211807a0b1e54a70cb
It might be wise to consider manually revising the raters segmentation to minimize the discrepancy (which hurts the model capabilities).
To illustrate and emphasize the low agreement / large variability among raters for lesion annotations in the spinal cord as reported in the previous comment, we can look at sub-P013
:
As of today, we started working on generating manually corrected lesion segmentations (not to be confused with #14 in which we performed the same task on SC segmentations instead). The manually corrected lesion segmentations will be
lesion-manual3
suffix, andlesion-manual
suffix) but ideally we will start from the majority vote.I'll re-do the seg from sub 1-17 and @uzaymacar will do the rest.
strategy:
~/duke/temp/uzay/basel-mp2rage-preprocessed/data_processed/
sub-P001_UNIT1_lesion-manual3.nii.gz
@mchen1110 is working on it
Completed in #56
In this image (from P010):
One rater (red) segmented lesions above the medulla oblongata, while the other (blue) did not. This creates inconsistencies when computing inter-rater variability, and when evaluating the performance of the trained model.
A suggestion for now would be to ignore lesions that are not in the spinal cord.