lseventeen / FR-UNet

[JBHI2022] Full-Resolution Network and Dual-Threshold Iteration for Retinal Vessel and Coronary Angiograph Segmentation
MIT License
112 stars 24 forks source link

Potential Issue with AverageMeter() in Metric Calculation #19

Open thoth000 opened 1 month ago

thoth000 commented 1 month ago

Hi,

I believe there might be an issue with the implementation of AverageMeter() for managing evaluation metrics.

https://github.com/lseventeen/FR-UNet/blob/master/utils/metrics.py

Issue:

From my understanding, the current implementation calculates metrics such as accuracy (Acc), sensitivity (Sen), F1 score, etc., for each image individually, and then averages these values across all images. Please correct me if my understanding is wrong.

Concern:

In practice, however, the correct approach would be to accumulate the true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) across the entire dataset and then calculate the final metrics (e.g., accuracy, sensitivity) based on these aggregated values.

Suggestion:

I recommend revisiting the implementation to ensure that the evaluation is done by aggregating TP, FP, TN, and FN over the entire dataset, as this provides a more accurate representation of the model's performance.

Thank you for your attention to this issue. If my understanding is incorrect, anyone is welcome to correct me. Best regards,

[Yuma]

thoth000 commented 1 month ago

I understand that in this implementation, the metrics are averaged over each image in the add() method. Specifically, the metrics like accuracy, F1 score, etc., are computed for each image and then averaged across all images.

Here is the relevant part of the code:

class AverageMeter(object):
    def add(self, val, weight):
        self.val = val
        self.sum = np.add(self.sum, np.multiply(val, weight))
        self.count = self.count + weight
        self.avg = self.sum / self.count

Is this implementation correct for calculating overall performance, or should the metrics be computed after accumulating TP, FP, TN, and FN across the entire dataset?