valeoai / RADIal

147 stars 50 forks source link

Average Precision Calculation does not calculate area under curve #46

Open JFT90 opened 1 year ago

JFT90 commented 1 year ago

In the metric code, the AP value seems to be calculated incorrectly. https://github.com/valeoai/RADIal/blob/main/FFTRadNet/utils/metrics.py#L199

AP should be calculated as the area under the precision-recall curve instead of just taking the mean over precisions: https://en.wikipedia.org/w/index.php?title=Information_retrieval&oldid=793358396#Average_precision

ArthurOuaknine commented 1 year ago

Hello. In practice, the area under the precision-recall curve is simulated by computing the average of precisions for discretized values of recall (see your own wikipedia link, section "Average precision, equation 4). In the object detection framework, these recall thresholds are equivalent to Intersection over Union (IoU) thresholds.

The common practice in object detection (PASCAL VOC, COCO and so on) is to define an average precision with, either a set of IoU threshold and perform the same equation than explained in the wikipedia page (e.g. $\text{AP}^{0.1 : 0.9}$), or considering a single IoU threshold (e.g. $\text{AP}^{0.5}$).

In our case, we considered only a single IoU threshold (0.5) and compute the AP as the average of precision of the kept predictions. There is no error, this is a basic metric in the PASCAL VOC benchmark. Feel free to propose your own evaluation metric if you prefer.

For further details, please have a look to this link. Hope it will help. Arthur

JFT90 commented 1 year ago

Hi, Thanks for your quick reply. I don't think your code does what is stated in either your link (see: https://pyimagesearch.com/2022/05/02/mean-average-precision-map-using-the-coco-evaluator/#H3Calc) or equation 4 on wikipedia.

If you want to take the mean you need to calculate interpolated precisions first over equally spaced recall levels: image

I think in your code the interpolated precision is not calculated (second equation).

For example checkout lines 135 - 138 in nuscenes devkit for interpolation of the precision on equally spaced recall levels: https://github.com/nutonomy/nuscenes-devkit/blob/9cba5925dc51b5c855252ee031eee5d6422f556b/python-sdk/nuscenes/eval/detection/algo.py#L135 Or the PIXOR Implementation: https://github.com/philip-huang/PIXOR/blob/2f64e84f72d915f8b5b80d4615bc9a0c881b5dab/srcs/postprocess.py#L159

I don't want to suggest an additional evaluation metric but want to correct the existing one.

Best regards!

Edit: To make it more clear - you are only "integrating" a limited recall range: recalls for the confidence thresholds (for your pretrained model) from 0.1 to 0.9: [0.7301587301587301, 0.7753968253968254, 0.8015873015873016, 0.8158730158730159, 0.8285714285714286, 0.8420634920634921, 0.8523809523809524, 0.8690476190476191, 0.8809523809523809]

ArthurOuaknine commented 1 year ago

Ok I see. It should be related to this loop: https://github.com/valeoai/RADIal/blob/ce58c8d114b0067e5b2eb4bca674ccd509444909/FFTRadNet/utils/metrics.py#L113 The current thresholds are [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]. The code should be for threshold in np.arange(0, 1.1, 0.1): to integrate the thresholds 0. and 1.. I did not implement this code so I don't know why this choice has been made. But it looks like we're missing two threshold values. @jrebut any idea?

Thanks for pointing this out.

JFT90 commented 1 year ago

I expect that only adding more confidence thresholds will still not correctly sample the recall precision curve. Maybe add this code to correctly compute the AP from recalls and precisions (similar to the PIXOR implementation):

def get_envelope(precisions):
    """Compute the precision envelope.
    Args:
      precisions:
    Returns:
    """
    for i in range(precisions.size - 1, 0, -1):
        precisions[i - 1] = np.maximum(precisions[i - 1], precisions[i])
    return precisions

def get_ap(recalls, precisions):
    """Calculate average precision.
    Args:
      recalls: sorted recall list - ascending order
      precisions: precision list in order corresponding to sorted recall list
    Returns (float): average precision.
    """

    # first append sentinel values at the end
    recalls = np.concatenate(([0.0], recalls, [1.0]))
    precisions = np.concatenate(([0.0], precisions, [0.0]))

    precisions = get_envelope(precisions)

    # to calculate area under PR curve, look for points where X axis (recall) changes value
    i = np.where(recalls[1:] != recalls[:-1])[0]

    # and sum (\Delta recall) * prec
    ap = np.sum((recalls[i + 1] - recalls[i]) * precisions[i + 1])
    return ap

Change the loop to ensure correct sorting: for threshold in np.linspace(0.001,0.95,19)[::-1]:

and instead of np.mean(perfs['precision']):

ap = get_ap(perfs['recall'], perfs['precision'])

ArthurOuaknine commented 1 year ago

Thanks for your valuable comment, it is appreciated. I leave this issue open and let @jrebut do the integration and verify the performances.

JFT90 commented 1 year ago

You are welcome - I think the issue is set to closed currently. So might be good to reopen