Hi, I am using "RN50x64" clip model to encode images. Specifically, I use image_z = self.clip_model.encode_image(pred_rgb)
to encode pred_rgb of shape [8,3,224,224] but failed with:
File "~/anaconda3/envs/text/lib/python3.9/site-packages/clip/model.py", line 69, in forward
x = x + self.positional_embedding[:, None, :].to(x.dtype) # (HW+1)NC
RuntimeError: The size of tensor a (50) must match the size of tensor b (197) at non-singleton dimension 0
Hi, I am using "RN50x64" clip model to encode images. Specifically, I use
image_z = self.clip_model.encode_image(pred_rgb)
to encodepred_rgb
of shape[8,3,224,224]
but failed with: