Beginner Question about Evaluation Metrics

1dmesh commented 7 months ago

I have a very beginner question about what the eval metrics actually mean. I can't quite seem to find anything in the docs (If I missed it, I'm sorry! I looked for quite awhile before asking, I promise).

Currently, I am gathering these metrics on validation and test: [aAcc, mAcc, mIoU, mDice, mFscore, mPrecision, mRecall]. I can use Google to figure out what these mean individually per image vs. ground truth... But what do they mean in the context of the val/test eval as a whole?

If we take aAcc as an example, calculated here. Does it take the sum of all class intersections with the gt, divided by the sum of the area of the gt?

I think my comprehension stops here. I have googled and tried to understand these few lines, but for some reason something is not clicking:

intersect = pred_label[pred_label == label] # I know this one!
area_intersect = torch.histc(
    intersect.float(), bins=(num_classes), min=0,
    max=num_classes - 1).cpu()
area_pred_label = torch.histc(
    pred_label.float(), bins=(num_classes), min=0,
    max=num_classes - 1).cpu()
area_label = torch.histc(
    label.float(), bins=(num_classes), min=0,
    max=num_classes - 1).cpu()
area_union = area_pred_label + area_label - area_intersect # This one is easy too!

Final Question!: To get the mIoU from the individual IoUs, this would just be averaging the IoU from every image in the test / val set? Do we take the average per class first vs. the gt, then average the samples?

I appreciate your time and help, thank you!

Zoulinx commented 6 months ago

Taking mAcc and aAcc as examples, when the sample size is extremely unbalanced, there is a significant difference between mAcc and aAcc: mAcc simply calculates the mean Acc for each class, while aAcc treats all classes as one and calculates the Acc.

1dmesh commented 3 months ago

Thank you for your kind reply! With this help, I was able to google more and get a better understanding.

open-mmlab / mmsegmentation

Beginner Question about Evaluation Metrics #3569