microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 251 forks source link

Question about class names in label.tsv for the captioning task #134

Closed clrrrr closed 3 years ago

clrrrr commented 3 years ago

Hi Oscar Team, great work!

I'm trying to generate the feature.tsv and label.tsv file for my own images for the image captioning task, and here's my question.

As mentioned in other issues, you use Faster R-CNN model https://github.com/peteanderson80/bottom-up-attention which is pretrained on Visual Genome dataset to yield the feature.tsv and label.tsv .

But when I checked your pre-extracted object tags in the (train/val/)test.labels.tsv file (downloaded from https://biglmdiag.blob.core.windows.net/oscar/datasets/coco_caption.zip ) , I found some of the class names were not included in the 1600 classes used by the bottom-up repo(I checked bottom-up-attention/data/genome/1600-400-20/objects_vocab.txt). image

I'm new to this topic, wondering what's wrong. Appereciate if you can help!

xiaoweihu commented 3 years ago

Hi, the class names you saw not in VG 1600 classes are from Open Images. In fact, the object tags in the released coco_caption data are generated by a model trained on Open Images, which is also used in the Oscar paper (denoted as oid_tags).

It was verified that the performance of using the tags from the two models (OpenImage trained and the released one) are quite close. I think it's fine to use the released model to generate tags on your own images.

clrrrr commented 3 years ago

Hi, the class names you saw not in VG 1600 classes are from Open Images. In fact, the object tags in the released coco_caption data are generated by a model trained on Open Images, which is also used in the Oscar paper (denoted as oid_tags).

It was verified that the performance of using the tags from the two models (OpenImage trained and the released one) are quite close. I think it's fine to use the released model to generate tags on your own images.

Got it, thanks!