sct-pipeline / contrast-agnostic-softseg-spinalcord

Contrast-agnostic spinal cord segmentation project with softseg
MIT License
4 stars 3 forks source link

Consider binarizing the ground truth after averaging across contrasts, before training a softseg model #99

Open jcohenadad opened 8 months ago

jcohenadad commented 8 months ago

Context

I reflected about the excellent discussion we had yesterday, and I came to the conclusion that keeping a soft mask after doing the averaging across all contrasts is quite problematic for several reasons:

  1. The ‘blurry’ aspect of the cord is more important where all contrasts overlap, vs. at the very bottom of the cord which is only covered by the T1w and T2w scans, creating a non-homogeneous ground truth, which could hamper the performance of the model;
  2. That blurry aspect is quite problematic in that it partly reflects the mis-registration across contrasts, which is absolutely not something we want to encode in our ground truth;
  3. Because of 2., we cannot make a strong argument that our contrast-agnostic soft model encodes partial volume (which is a big ‘selling point’).
  4. When performing active learning by creating ground truth on new dataset, we've been struggling to mimic the softness of our ground truth (see notably: https://github.com/sct-pipeline/contrast-agnostic-softseg-spinalcord/issues/84). In fact, based on the points above, we've been trying to mimic a softness that is not representative of meaningful information (ie: partial volume), but rather methodological flaws (ie: mis-registrations, inhomogeneous coverage).
  5. That "extra" softness in our ground truth is incorporated into our model, which then gives predictions that we are wondering how to properly threshold (https://github.com/sct-pipeline/contrast-agnostic-softseg-spinalcord/issues/98).

Solution

For all these reasons, I am wondering if binarizing (with 0.5 threshold) the ground truth after averaging would solve many of our problems:

Related to:

sandrinebedard commented 8 months ago

I agree with these points. It would simplify the next steps. I think binarazing at 0.5 the GT makes sense, that it will represent the average of the 6 contrasts, but not enconding the registration errors

When we trained with nnunet, we actually did binarize the GT and nnunet gave sharper boyndary/ adapted better to the shape of the cord than the MONAI (e.g. for spinal cord compressions). Maybe this will help for this too!

naga-karthik commented 8 months ago

Thank you for discussion these important points and summarizing them here. For points 2-5, I have nothing else to add as I agree with all of them, and they quite succintly describe the issues we've been facing so far! In point 1:

very bottom of the cord which is only covered by the T1w and T2w scans, creating a non-homogeneous ground truth, which could hamper the performance of the model;

This might have already been the case. The current version of the dataset is dominated by images/contrasts with smaller FoV (i.e. T2*, MTon, MToff, DWI) and the model has only seen blurry/oversegmented GT (due to possible mis-registration errors) during training. As a result, its tendency is to output predictions with (relatively) more voxels outside SC for these contrasts compared to T1w and T2w.

As for the solutions, I have one question:

  1. For spine-generic (or, in general for datasets where we have multiple contrasts for a given subject), we can go with binarizing the soft GT, but what about datasets with a single contrast (i.e. MP2RAGE) ? -- Should we directly use the (corrected) hard GT then?
    • One important distinction we made in the paper is this -- hard GT != averaged soft GT binarized at 0.5 (which makes sense). Now, if we want to extend contrast-agnostic on, say MP2RAGE, should we mix the soft binarized GT from spine-generic and hard GT of MP2RAGE during (re)training ?

Adapting existing ground truths on other datasets (to enrich the contrast agnostic model) would be much easier

This is definitely true. This also means much quicker analyses on the lifelong learning aspect of the model.

jcohenadad commented 8 months ago

Training Code: https://github.com/sct-pipeline/contrast-agnostic-softseg-spinalcord/blob/nk/improve-training-procedure/monai/run_inference_single_image.py

Transforms: https://github.com/sct-pipeline/contrast-agnostic-softseg-spinalcord/blob/nk/improve-training-procedure/monai/transforms.py

Slides with results: https://docs.google.com/presentation/d/1N3DIO6LO8R8QnnQ9pASxBVOGYRpQVmUae6Ufc0GZgxE/edit#slide=id.g2b1eb018824_0_0

naga-karthik commented 2 months ago

Just to have it documented, here's the comparison between: (i) soft output by the model trained directly on soft masks (pred_soft), and (ii) soft output by the model trained on binarized soft averaged GT as inputs (pred_soft_bin). Note that both are SoftSeg models but on which GT masks the data augmentation is applied changes -- for (i) the transformations are applied directly on soft averaged masks and for (ii) they are applied on binarized soft averaged masks.

This gif here was discussed in one of our meetings, showing that the model trained with binarized soft average GT is better at estimating the partial volume at the boundary for the T2star image. Note the size of the ring of soft values decreasing for the pred_soft_bin image.

model output comparison: soft GT vs binarized soft GT inputs ![ezgif com-animated-gif-maker](https://github.com/user-attachments/assets/07a33543-41b3-483a-9dd4-f64c59d9c5db)