microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 252 forks source link

How to generate object tags and image features for Open Images (inputs for nocaps)? #60

Open joeyy5588 opened 3 years ago

joeyy5588 commented 3 years ago

Thanks for your amazing work!

I've checked the description in DOWNLOAD.md and I can't find the feature files for novel object captioning (nocaps). Is it possible for you to release data / detector pretrained on Open Images for nocaps?

I'd be grateful if you could let me know how to obtain the object tags and image features for nocaps, and I'd also appreciate for any details for reproducing the nocaps results.

Thanks in advance!

xiyinmsu commented 3 years ago

for nocaps, the image features are generated from the model trained in BUTD paper. The object detector trained on open images is described here: https://storage.googleapis.com/openimages/challenge_2019/challenge_results/objdet_5thplace.pdf

ChenYutongTHU commented 3 years ago

Hello, would you mind introducing more details about tagging selection strategy from the open images detector? It's only briefly mentioned in VIVO that the maximum length of tags is 30 (finetuning) and 15 (pretraining) and they are composed of tags produced by the OI detector and groundtruth tags (?). Thanks a bunch!