Incomplete File, imageid2idx.json and Image Features

Ceralu commented 3 years ago

I am trying to follow the README file in the Oscar MODEL_ZOO.md and I am encountering a few issues.

For finetuning on the tasks of VQA, GQA and NLVR2, I am getting RuntimeError: storage has wrong size: expected -8853154286956286511 got 61620 or RuntimeError: unexpected EOF, expected 137993 more bytes. The file might be corrupted. when it's trying to read the image features. The first error happens for finetuning on VQA (large model), GQA and NLVR2 with the following:
```
img_features = torch.load(os.path.join(args.data_dir, feat_file_name))
RuntimeError: unexpected EOF, expected 78990 more bytes. The file might be corrupted.
```

The second error happens with finetuning on VQA with the base model:

    self.img_features = torch.load(os.path.join(args.data_dir, '{}_img_frcnn_feats.pt'.format(name)))
    RuntimeError: unexpected EOF, expected 137993 more bytes. The file might be corrupted.

I have downloaded the dataset files with wget --continue --tries=0. When I try to extract the archive, it seems most files are okay except the feature files.

For the Image Text Retrieval task, I am wondering where I can find the imageid2idx.json file? It doesn't seem to be in the coco_ir directory.
When I'm trying to finetune an Oscar base model on Image Captioning on COCO with the cross-entropy loss, there seems to be an error with the run_captioning.py file which calls dataset = build_dataset(...). Upon further inspection, it seems that the __getitem__ function call is not working properly when it tries to call self.get_image_features(img_idx). That function is calling feat_info = json.loads(self.feat_tsv.seek(img_idx)[1]) but [1] is out of bounds

byougert commented 3 years ago

Hi, I am encountering the same issue as you mentioned when I run the image-text retrieval task on Oscar(not Vinvl). There is a fatal error that shows "No such file: imageid2idx.json". I find anything but imageid2idx.json, so I wonder if you get it ?

Ceralu commented 3 years ago

Hey, I still have the same problem. I wasn't able to find the imageid2idx.json for that task.

For the other two tasks, I was able to solve them by downloading the data properly with AzCopy instead of wget.

Hi, I am encountering the same issue as you mentioned when I run the image-text retrieval task on Oscar(not Vinvl). There is a fatal error that shows "No such file: imageid2idx.json". I find anything but imageid2idx.json, so I wonder if you get it ?

byougert commented 3 years ago

Wow, In fact, I have successed in running the image-text task on Vinvl(Oscar+), where the file "imageid2idx.json" is placed in Pre-exacted Image Features. However, I can't find it in Osar(NOT Vinvl). It seems imageid2idx.json and image features fail to be released in Oscar(NOT Vinvl).

microsoft / Oscar

Incomplete File, imageid2idx.json and Image Features #108