How to map the VinVL Features of Conceptual Caption to the original dataset? - Githubissues

microsoft / Oscar

Oscar and VinVL

MIT License

1.04k stars 252 forks source link

How to map the VinVL Features of Conceptual Caption to the original dataset? #111

Closed zmykevin closed 2 years ago

zmykevin commented 3 years ago

Hi, thank you for sharing the VinVL features. Recently I experimenting on doing vision-language pre-training on Conceptual Caption (CC), but I could not find the mapping between your Image IDs for the CC image features and the original CC dataset, which makes it hard to find the paired caption for each VINVL feature. I wonder if you could tell me how to map your image ids of the VinVL Features to the original dataset? Thanks!

pzzhang commented 3 years ago

Unfortunately, we did not save the correspondence between the original download link and our customized image ids.

One thing that may help you is the caption file we use for these images: https://biglmdiag.blob.core.windows.net/vinvl/image_features/googlecc_X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/model_0060000/0/annotations/dataset_cc.json

You can use these captions to reversely find the original names of these images.

Or you can just use our released VinVL model (https://github.com/microsoft/scene_graph_benchmark) to extract visual features in your downloaded version of conceptual captions.