mlfoundations / open_clip

An open source implementation of CLIP.
Other
9.15k stars 908 forks source link

coca model get dimension mismatch error when dimension is 77 #840

Closed Yufang-Liu closed 3 months ago

Yufang-Liu commented 3 months ago

The following code would get error RuntimeError: The size of tensor a (77) must match the size of tensor b (78) at non-singleton dimension 2. When change 77 to 76, the error is gone. Any suggestions?

model, _, image_preprocess = open_clip.create_model_and_transforms(model_name="coca_ViT-B-32", pretrained="laion2B-s13B-b90k", device="cuda")
model = model.eval()

input_text = torch.rand(4, 77, device='cuda').long()

text_feats = model.encode_text(input_text)
print(text_feats.shape)
amitakamath commented 3 months ago

I'm seeing this error as well. Changing max_seq_len didn't seem to help either.

gpucce commented 3 months ago

Hi, 76 is the largest possible value because clip tokeniser used to have 77 tokens max and coca uses one to pass it to the contrastive side, that is done by the model internally and it needs to have one less in the tokens.

For the case when setting 76 does not help, you could try updating transformers.