Open lostnighter opened 2 years ago
Hi lostnighter,
I had the same problem when using OSCAR to fine-tune on image captioning with a custom dataset. I used this function to genereate the '.lineidx'-file
I guess that in your case you have a 'coco_flickr30k_googlecc_gqa_sbu_oi.tsv' file. If that is true, you should try the function above, with parameters:
` filein, idxout = 'coco_flickr30k_googlecc_gqa_sbu_oi.tsv', 'coco_flickr30k_googlecc_gqa_sbu_oi.lineidx'
Let me know if it works! `
Hi jontooy, I download this file via azcopy as follows: path/to/azcopy copy https://biglmdiag.blob.core.windows.net/vinvl/pretrain_corpus/coco_flickr30k_googlecc_gqa_sbu_oi.lineidx ./ --recursive
This url is not given. I just try it out.
Hi! This file is needed for pretraining on Large corpus, but is not found. Could you share this file?
Thanks!