Open uzaymacar opened 3 years ago
* n the near-future ( < 20 days 😅), if we ever decide to utilize all of the available GTs from experts in a model (which IMO, we should!), these measures will act as a valid baseline!
I'm in favour of this too!! We could actually use this ivadomed
tool, eg [["_lesion-manual-rater1", "_lesion-manual-rater2"]]
.
to quantify each rater's (expert radiologist) performance with respect to the consensus GT (senior expert radiologist + fusion via majority voting).
Will take care of this by the end of the week.
It might be interesting to quantify each rater's (expert radiologist) performance with respect to the consensus GT (senior expert radiologist + fusion via majority voting). Idea inspired from Fig. 4. in this paper (also mentioned in our Slack channel). The metric could be Dice score (I previously looked at mean differences, but that isn't particularly helpful). This will be useful in two ways: