Potential Issue with AverageMeter() in Metric Calculation

Hi,

I believe there might be an issue with the implementation of AverageMeter() for managing evaluation metrics.

https://github.com/lseventeen/FR-UNet/blob/master/utils/metrics.py

Issue:

From my understanding, the current implementation calculates metrics such as accuracy (Acc), sensitivity (Sen), F1 score, etc., for each image individually, and then averages these values across all images. Please correct me if my understanding is wrong.

Concern:

In practice, however, the correct approach would be to accumulate the true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) across the entire dataset and then calculate the final metrics (e.g., accuracy, sensitivity) based on these aggregated values.

Suggestion:

I recommend revisiting the implementation to ensure that the evaluation is done by aggregating TP, FP, TN, and FN over the entire dataset, as this provides a more accurate representation of the model's performance.

Thank you for your attention to this issue. If my understanding is incorrect, anyone is welcome to correct me. Best regards,

[Yuma]

lseventeen / FR-UNet