import open_clip
import torch
from PIL import Image
model, _, transform = open_clip.create_model_and_transforms(
model_name="coca_ViT-L-14",
pretrained="mscoco_finetuned_laion2B-s13B-b90k"
)
im = Image.open("cat.jpg").convert("RGB")
im = transform(im).unsqueeze(0)
with torch.no_grad(), torch.cuda.amp.autocast():
generated = model.generate(im)
print(open_clip.decode(generated[0]).split("<end_of_text>")[0].replace("<start_of_text>", ""))
But get the following error
Traceback (most recent call last):
File "demo.py", line 16, in <module>
generated = model.generate(im)
File "/home/aghosh/anaconda3/envs/2pcnetnew/lib/python3.8/site-packages/open_clip/coca_model.py", line 233, in generate
output = self._generate_beamsearch(
File "/home/aghosh/anaconda3/envs/2pcnetnew/lib/python3.8/site-packages/open_clip/coca_model.py", line 351, in _generate_beamsearch
raise ValueError(
ValueError: Batch dimension of `input_ids` should be 0, but is 6.
I already tried the solutions in here, including installing transformers=4.30.2 and changing the computation of batch_size, but it does not solve the issue.
I am trying to run this code
But get the following error
I already tried the solutions in here, including installing transformers=4.30.2 and changing the computation of batch_size, but it does not solve the issue.