I checked your evaluation method and found that the results based on the mean of fprs over all images are better than the results based on flattened scores and labels of the entire dataset. Why do you use image-wise results instead of the entire dataset to calculate the fpr.
I do not think we had a particular reason to aggregate across images. Both strategies seem valid given a moment's thought. There may have been memory difficulties in aggregating across the whole dataset.
Hi,
Congrats on the great paper.
I checked your evaluation method and found that the results based on the mean of fprs over all images are better than the results based on flattened scores and labels of the entire dataset. Why do you use image-wise results instead of the entire dataset to calculate the fpr.
Thanks.