I'm going to conduct pre-training experiments. As described in https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md#pre-exacted-image-features, there are three COCO datasets, including coco2014train/val, coco2014test, and coco2015test. Each of the three datasets has 5 files, including features.lineidx, features.tsv, imageid2idx.json, predictions.lineidx, and predictions.tsv.
So, do I need to combine these files? For example, combine three features.tsv files into a large features.tsv file?
So did you conduct pre-training experiments?
I found that COCO pre-training (image feature & label) datasets are not well mapped to the pre-training corpus! :(
I'm going to conduct pre-training experiments. As described in https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md#pre-exacted-image-features, there are three COCO datasets, including coco2014train/val, coco2014test, and coco2015test. Each of the three datasets has 5 files, including features.lineidx, features.tsv, imageid2idx.json, predictions.lineidx, and predictions.tsv.
So, do I need to combine these files? For example, combine three features.tsv files into a large features.tsv file?