Quantification of Rater Performance

ivadomed / ms-challenge-2021

MS segmentation challenge

MIT License

12 stars 0 forks source link

Quantification of Rater Performance #32

Open uzaymacar opened 3 years ago

uzaymacar commented 3 years ago

It might be interesting to quantify each rater's (expert radiologist) performance with respect to the consensus GT (senior expert radiologist + fusion via majority voting). Idea inspired from Fig. 4. in this paper (also mentioned in our Slack channel). The metric could be Dice score (I previously looked at mean differences, but that isn't particularly helpful). This will be useful in two ways:

How do our multi-channel, single-consensus GT models compare with these?
In the near-future ( < 20 days 😅), if we ever decide to utilize all of the available GTs from experts in a model (which IMO, we should!), these measures will act as a valid baseline!

charleygros commented 3 years ago

* n the near-future ( < 20 days 😅), if we ever decide to utilize all of the available GTs from experts in a model (which IMO, we should!), these measures will act as a valid baseline!

I'm in favour of this too!! We could actually use this ivadomed tool, eg [["_lesion-manual-rater1", "_lesion-manual-rater2"]].

charleygros commented 3 years ago

to quantify each rater's (expert radiologist) performance with respect to the consensus GT (senior expert radiologist + fusion via majority voting).

Will take care of this by the end of the week.