Maybe wrong in F_measure calculation on openset

zhmiao / OpenLongTailRecognition-OLTR

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

BSD 3-Clause "New" or "Revised" License

839 stars 128 forks source link

Maybe wrong in F_measure calculation on openset #52

Closed tuobay closed 4 years ago

tuobay commented 4 years ago

https://github.com/zhmiao/OpenLongTailRecognition-OLTR/blob/master/utils.py#L88

Seems it should be changed from

false_pos += 1 if preds[i] != labels[i] and labels[i] != -1 and preds[i] != -1 else 0 to : false_pos += 1 if preds[i] != labels[i] and ((labels[i] != -1 and preds[i] != -1) or label[i] == -1) else 0

zhmiao commented 4 years ago

Hello @tuobay , thanks for asking. The F-measurement we use is according to this paper: https://arxiv.org/pdf/1511.06233.pdf , where false-positive is defined as "incorrect classifications on the validation set". I think the validation set in this paper are from seen classes, so that for false positives, labels should not equal to -1. Does that make sense?

saurabhsharma1993 commented 4 years ago

@zhmiao a related question to above: I tried changing the open set threshold as in Fig 8b) of the paper, but I'm getting different results. Specifically, I get a monotonically increasing sequence of F-measure values, rather than decreasing, on increasing open set threshold from 0 to 1. Any ideas why ?

saurabhsharma1993 commented 4 years ago

This is what I get for ImageNet-LT, using the Stage-2 model (its not the latest) :

F_measure (with threshold 0.00) :0.3993 F_measure (with threshold 0.10) :0.4532 F_measure (with threshold 0.20) :0.5793 F_measure (with threshold 0.30) :0.6842 F_measure (with threshold 0.40) :0.7646 F_measure (with threshold 0.50) :0.8261 F_measure (with threshold 0.60) :0.8696 F_measure (with threshold 0.70) :0.9095 F_measure (with threshold 0.80) :0.9348 F_measure (with threshold 0.90) :0.9475

tuobay commented 4 years ago

I think the TP + FP + TN + FN = num of all test samples

So the class (label = -1 and pred = positive) should be put into FP.

saurabhsharma1993 commented 4 years ago

@zhmiao @liuziwei7 awaiting your response

zhmiao commented 4 years ago

Hello @tuobay @ssfootball04 . Thank you very much for the discussion. We think the problem is indeed false positive calculation. The new false positive is : false_pos += 1 if preds[i] != labels[i] and labels[i] != -1 else 0 . We removed the preds[i] != -1 since it actually does not make sense. According to the paper we cited, " false positives are incorrect classifications on the validation set". So we think the current calculation is correct. After removing this, the F-measure numbers are normal. We think the reported F-measure numbers are a little bit higher than actual numbers for all baselines. We will update it as soon as possible. I have pushed the new code already. Please check it out. Thanks again.

zhmiao commented 4 years ago

Since it has been a while, I will close this issue for now. Please feel free to re-open this issue if any questions raised. Thanks.