A way to assess normalization quality

MerveKaptan commented 7 months ago

Dear all (@jcohenadad @kennethaweberii ),

Currently, we are using the Dice Coefficient (between manual vs automatic mask) as a metric to assess the performance of newly trained model.

I had a meeting with @rohanbanerjee and we were discussing possible metrics to assess the quality of EPI to template normalization/registration (as this is one of the most important outcomes of automated segmentation from the user perspective, right?).

Therefore, the question is: how to assess and quantify the quality of template normalization?

I encountered this recent preprint.

Here they use a multi-step normalization method and compare it with single-step normalization (details are not important for my question), using the following approach:

`

To evaluate the registration performance, the binary cord segmentation mask in the EPI space was registered to PAM50 space using each method, and the result was compared to the binary template cord mask. Two indices were defined by calculating: i) the number of voxels in the template mask that are missed in the registered mask, normalized by the total number of template cord voxels, and ii) the overlap between the registered and the template cord masks, normalized by the total number of voxels in both masks.

`

What do you think? Is this a valid approach to evaluate the performance of registration?

jcohenadad commented 7 months ago

I'm not sure I fully understand the context of this issue. The current repository is about cord segmentation on EPI scans. Why are we talking about registration to the template here?

MerveKaptan commented 7 months ago

I'm not sure I fully understand the context of this issue. The current repository is about cord segmentation on EPI scans. Why are we talking about registration to the template here?

Hi Julien,

I posted this here as it is relevant to this project and the publication that will come out of this project (instead of an email). Sorry that it was not clear.

I thought that one of the main reasons that we need to manually segment the cord is to be able to properly register EPI data to the PAM50 template (from the perspective of an SCT user who is analyzing fMRI data ).

Then, for someone interested in spinal fMRI analysis, would not it help to see that the normalization using automated segmentation performs comparable to manual segmetation?

If yes, would not it be important to demonstrate it in the paper?

jcohenadad commented 7 months ago

I thought that one of the main reasons that we need to manually segment the cord is to be able to properly register EPI data to the PAM50 template (from the perspective of an SCT user who is analyzing fMRI data ).

indeed, it is

Then, for someone interested in spinal fMRI analysis, would not it help to see that the normalization using automated segmentation performs comparable to manual segmetation? If yes, would not it be important to demonstrate it in the paper?

Demonstrating improvements in registration is quite tricky. One thing we could do, though, is run the registration pipeline and, for each site, show the group average EPI in the PAM50 space (one fig: one sub-panel per axial view)

MerveKaptan commented 7 months ago

Demonstrating improvements in registration is quite tricky. One thing we could do, though, is run the registration pipeline and, for each site, show the group average EPI in the PAM50 space (one fig: one sub-panel per axial view)

Thank you, Julien!

Yes, definitely!! I assume that quality would be tricky to distinguish from the average (hopefully not!) . We can maybe accompany it with some sort of heat map (like a standard deviation map) to show consistency across subjects.

What did you think about the metric that was employed in the recent preprint?

jcohenadad commented 6 months ago

Yes, definitely!! I assume that quality would be tricky to distinguish from the average (hopefully not!) . We can maybe accompany it with some sort of heat map (like a standard deviation map) to show consistency across subjects.

I wouldn't go that far-- I think that a group average per site is enough. If reviewers want more meat, we will produce more meat.

What did you think about the metric that was employed in the recent preprint?

What preprint are you referring to?

MerveKaptan commented 6 months ago

I wouldn't go that far-- I think that a group average per site is enough. If reviewers want more meat, we will produce more meat.

Good to know, thank you!

What preprint are you referring to?

This one that I posted above! https://www.researchsquare.com/article/rs-3889284/v1

sct-pipeline / fmri-segmentation

A way to assess normalization quality #32