openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
25.85k stars 3.31k forks source link

RN50x64 Model encodes image failed with size dismatch #238

Open Wayne-Mai opened 2 years ago

Wayne-Mai commented 2 years ago

Hi, I am using "RN50x64" clip model to encode images. Specifically, I use image_z = self.clip_model.encode_image(pred_rgb) to encode pred_rgb of shape [8,3,224,224] but failed with:

 File "~/anaconda3/envs/text/lib/python3.9/site-packages/clip/model.py", line 69, in forward
    x = x + self.positional_embedding[:, None, :].to(x.dtype)  # (HW+1)NC
RuntimeError: The size of tensor a (50) must match the size of tensor b (197) at non-singleton dimension 0
robertodessi commented 2 years ago

Hi, I found a similar error. Were you able to fix it in the end?

forminju commented 10 months ago

hi, same problem. did you solve this problem?