The calculation of open-set F-measure

drcege commented 5 years ago

Hi, I wonder if true positive, false positive and false negative are counted correctly. https://github.com/zhmiao/OpenLongTailRecognition-OLTR/blob/4a1f4009921b1c99029bfda151915058ff086a51/utils.py#L86-L89 Here are some examples according to the above code: (pairs of prediction and label)

class_a, class_a (TP)
class_b, class_a (FP)
-1, class_a (?)
class_a, -1 (FN)
-1, -1 (?)

I'm confused about

why the 2nd example is counted as FP rather than FN? (FP means the label is negative, but the prediction is positive, so what is positive here)
why the 3nd example is not counted as FN?
is the last example TN or TP?

zhmiao commented 5 years ago

Hello @drcege , thanks for asking. We followed this paper: https://arxiv.org/abs/1511.06233 for the F-measure calculation. It was modified from the original equation for better openset evaluation.

jchhuang commented 5 years ago

@zhmiao @drcege I also feel confused on this issue, I suggest authors @zhmiao can explain it clearly rather than re-direct readers to refer other literature since this criterion is so important for this manuscript.

jchhuang commented 5 years ago

@zhmiao I further seek related information from https://arxiv.org/abs/1511.06233，however， I got nothing explain, may you pay some attention to explain this issue？

jchhuang commented 5 years ago

@drcege hi, have you understand this issue, I have the same confusing feeling as you, may you share you newest understanding？

pedrormjunior commented 2 years ago

Searching for "open-set f-measure" I just found this thread. Here are my thoughts on this issue. Many works on open-set recognition use f-measure but the authors do not specify how the metric is calculated. Sometimes it seems to me that the conclusions presented in some works on open-set recognition are just non-reliable due to the metric employed. How to be sure that the employed metric is measuring a better open-set behavior? It also seems that some authors do not mind if the metric is really capturing a better behavior of the classifier. I do not know details of the work associated to this repository, anyhow, I agree with the concerns raised by @drcege. If any of you are interested in evaluation metrics specially proposed for open-set recognition, take a look in a paper in which we talk about this issue [1]. There in Section 4.1 we propose the open-set f-measure and the normalized accuracy, both to be employed on open-set problems.

[1] Mendes Júnior, P. R. and de Souza, R. M. and Werneck, R. de O. and Stein, B. V. and Pazinato, D. V. and de Almeida, W. R. and Penatti, O. A. B. and Torres, R. da S. and Rocha, A. de R. (2017). "Nearest Neighbors Distance Ratio Open-Set Classifier". Machine Learning, 106, 359–386. https://doi.org/10.1007/s10994-016-5610-8.

zhmiao / OpenLongTailRecognition-OLTR

The calculation of open-set F-measure #13