j-min / CLIP-Caption-Reward

PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
https://arxiv.org/abs/2205.13115
Other
237 stars 27 forks source link

Phase 1 validation throws: shape '[4, -1, 512]' is invalid for input of size 64000 #6

Open vbursztyn opened 2 years ago

vbursztyn commented 2 years ago

Hello,

I have successfully generated all features (both text and visual) for the COCO dataset. However, when running MLE training, the code throws the following error at the moment it starts validation at 96% of the first epoch:

File "/home/soaresbu/clip-captioning/captioning/utils/clipscore.py", line 177, in forward
    refclip_s = self.calc_refclip_s(
  File "/home/soaresbu/clip-captioning/captioning/utils/clipscore.py", line 124, in calc_refclip_s
    ref_text_feat = ref_text_feat.view(B, -1, dim)
RuntimeError: shape '[4, -1, 512]' is invalid for input of size 64000

Any idea of what could be wrong here? Am I missing something when generating CLIP-S with python scripts/clipscore_prepro_feats.py?

j-min commented 2 years ago

Have you edited any of the existing codebase? ref_text_feat should have size of (B=batch size, K=number of references (usually 5 for COCO), dim=512 for ViT/B-32)

vbursztyn commented 2 years ago

I have not. Validation in Phase 1 breaks because B = 4 instead of 5, and in Phase 2 it also breaks because B = 3 instead of 4. I'm still confused by this. I've tried to overwrite these values in calc_refclip_s, but it causes other parts of the code to break... How is this validation batch size calculated? If it's a function of the available infrastructure, I'm also running on a v100 with 8 GPUs.

j-min commented 2 years ago

I see. Could you please share the full error log trace? I could not reproduce the error.