Evaluation results in the openimages v4 paper

openimages / dataset

The Open Images dataset

https://storage.googleapis.com/openimages/web/index.html

Apache License 2.0

4.27k stars 603 forks source link

Evaluation results in the openimages v4 paper #81

Closed yoosan closed 4 years ago

yoosan commented 5 years ago

Hi, Recently, the team submitted a paper on arxiv describing the openimages v4 dataset. "The Open Images Dataset V4 Unified image classification, object detection, and visual relationship detection at scale". We are focusing the subtask of multi-label image classification. When we check the human-verified test set of v4, we found that there are 4698 of trainable 7186 classes have positive examples, the other 2488 have no positive examples. That means, the maximum mAP score should be 0.654. In the Fig.27, the paper presents a baseline performance of ~0.705 map. Is there anything wrong?

nalldrin commented 5 years ago

Hi Yoosan,

The explanation is that the mAP is calculated only for classes that are in the trainable set and have groundtruth (either positive or negative) in the test set. This should correspond to 4728 labels assuming I haven't made a mistake somewhere.

yoosan commented 5 years ago

Thanks nalldrin, but we got another question. If the class only has positive annotations(no negative), according to eq(4) in paper Learning From Noisy Large-Scale Datasets With Minimal Supervision, whose average precision always could be 1.0. Is it a suitable evaluation metrics

nalldrin commented 4 years ago

To close out this thread you are right that such labels do not really add any value but it is a rare case so in practice I think is not a big issue when looking at overall mAP.