Closed rohanbanerjee closed 11 months ago
Dear all,
Based on the QCs that @rohanbanerjee shared with me, I suggest these datasets to be included in the model:
I am ambiguous about Stanford rest and Stanford rest Martucci. Stanford Rest is due to data quality (although masks are fine). Please see an example below: without mask with mask
I think Stanford rest Martucci's masks are too big and extends beyond the cord. Please see an example below:
with mask
looks ok-- slightly undersegmented to my taste-- last slice mask is missing
I think Stanford rest Martucci's masks are too big and extends beyond the cord. Please see an example below:
waaaay oversegmented--
next time pls show the GIF-- easier to assess
Thank you @MerveKaptan for the verification! So as it stands, we have to just remove the kcl_rest
set for the the multi-site
dataset we were already using.
Another thing to note here is that, there are cases in which we could use parts of the segmentation in the training set. The following cases are examples of what I mean:
In case 1, these slices can be rectified and used for training. In case 2, the ground truth from these slices can either be removed or a new ground truth can be draw for these specific images.
Pro of what I mentioned -- we have more variability and in the training set. Con -- This is can introduce biases.
looks ok-- slightly undersegmented to my taste-- last slice mask is missing
The first and last slices are not used in the further analyses - this is why they are not segmented!
The first and last slices are not used in the further analyses - this is why they are not segmented!
Right, but the DL model doesn't know this, so it contributes to the under-performance of the model. @rohanbanerjee pls make sure to address this
Closing this issue since the purpose of this issue was to select the datasets. Can be re-opened if necessary.
I am opening this issue to have the discussion around selection of the datasets to be included for training based on the quality of the ground truths. The list of all the datasets can be found here
Current status: