openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
24.55k stars 3.2k forks source link

finetuning rn50x64 model, encode_image error #414

Open forminju opened 8 months ago

forminju commented 8 months ago

I got problem when finetuning RN 50x64 model

File "/home/student/minju/CLIP/lib/python3.8/site-packages/clip/model.py", line 71, in forward x = x + self.positional_embedding[:, None, :].to(x.dtype) # (HW+1)NC RuntimeError: The size of tensor a (50) must match the size of tensor b (197) at non-singleton dimension 0 like this. anyone know how to fix this problem?