Dataset QC manual verification and selection for training final MONAI segmentation model

sct-pipeline / fmri-segmentation

Repository for the project on automatic spinal cord segmentation based on fMRI EPI data

MIT License

4 stars 1 forks source link

Dataset QC manual verification and selection for training final MONAI segmentation model #21

Closed rohanbanerjee closed 11 months ago

rohanbanerjee commented 11 months ago

I am opening this issue to have the discussion around selection of the datasets to be included for training based on the quality of the ground truths. The list of all the datasets can be found here

Current status:

Shared SCT QC for all the datasets with @MerveKaptan
@MerveKaptan is going through the QCs

MerveKaptan commented 11 months ago

Dear all,

Based on the QCs that @rohanbanerjee shared with me, I suggest these datasets to be included in the model:

Geneva
Leipzig rest
NWMotor
NWMotorWeber
NWTactile
NWThermal
Zurich Cervical

I am ambiguous about Stanford rest and Stanford rest Martucci. Stanford Rest is due to data quality (although masks are fine). Please see an example below: without mask with mask

I think Stanford rest Martucci's masks are too big and extends beyond the cord. Please see an example below:

jcohenadad commented 11 months ago

with mask

looks ok-- slightly undersegmented to my taste-- last slice mask is missing

I think Stanford rest Martucci's masks are too big and extends beyond the cord. Please see an example below:

waaaay oversegmented--

next time pls show the GIF-- easier to assess

rohanbanerjee commented 11 months ago

Thank you @MerveKaptan for the verification! So as it stands, we have to just remove the kcl_rest set for the the multi-site dataset we were already using.

Another thing to note here is that, there are cases in which we could use parts of the segmentation in the training set. The following cases are examples of what I mean:

In the whole image, there is a missing ground truth slice/only a few slices which are not good quality.
There are places where the SC is not clearly visible and the ground truth is still drawn.

In case 1, these slices can be rectified and used for training. In case 2, the ground truth from these slices can either be removed or a new ground truth can be draw for these specific images.

Pro of what I mentioned -- we have more variability and in the training set. Con -- This is can introduce biases.

MerveKaptan commented 11 months ago

looks ok-- slightly undersegmented to my taste-- last slice mask is missing

The first and last slices are not used in the further analyses - this is why they are not segmented!

jcohenadad commented 11 months ago

The first and last slices are not used in the further analyses - this is why they are not segmented!

Right, but the DL model doesn't know this, so it contributes to the under-performance of the model. @rohanbanerjee pls make sure to address this

rohanbanerjee commented 11 months ago

Closing this issue since the purpose of this issue was to select the datasets. Can be re-opened if necessary.