microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 252 forks source link

How to combine COCO pre-training datasets #86

Closed yikaiw closed 3 years ago

yikaiw commented 3 years ago

I'm going to conduct pre-training experiments. As described in https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md#pre-exacted-image-features, there are three COCO datasets, including coco2014train/val, coco2014test, and coco2015test. Each of the three datasets has 5 files, including features.lineidx, features.tsv, imageid2idx.json, predictions.lineidx, and predictions.tsv.

So, do I need to combine these files? For example, combine three features.tsv files into a large features.tsv file?

ahnjaewoo commented 3 years ago

So did you conduct pre-training experiments? I found that COCO pre-training (image feature & label) datasets are not well mapped to the pre-training corpus! :(