mlfoundations / open_clip

An open source implementation of CLIP.
Other
9.29k stars 923 forks source link

ValueError: Batch dimension of `input_ids` should be 0, but is 6. #816

Closed ShenZheng2000 closed 5 months ago

ShenZheng2000 commented 5 months ago

I am trying to run this code

import open_clip
import torch
from PIL import Image

model, _, transform = open_clip.create_model_and_transforms(
  model_name="coca_ViT-L-14",
  pretrained="mscoco_finetuned_laion2B-s13B-b90k"
)

im = Image.open("cat.jpg").convert("RGB")
im = transform(im).unsqueeze(0)

with torch.no_grad(), torch.cuda.amp.autocast():
  generated = model.generate(im)

print(open_clip.decode(generated[0]).split("<end_of_text>")[0].replace("<start_of_text>", ""))

But get the following error

Traceback (most recent call last):
  File "demo.py", line 16, in <module>
    generated = model.generate(im)
  File "/home/aghosh/anaconda3/envs/2pcnetnew/lib/python3.8/site-packages/open_clip/coca_model.py", line 233, in generate
    output = self._generate_beamsearch(
  File "/home/aghosh/anaconda3/envs/2pcnetnew/lib/python3.8/site-packages/open_clip/coca_model.py", line 351, in _generate_beamsearch
    raise ValueError(
ValueError: Batch dimension of `input_ids` should be 0, but is 6.

I already tried the solutions in here, including installing transformers=4.30.2 and changing the computation of batch_size, but it does not solve the issue.

rwightman commented 5 months ago

@ShenZheng2000 install more recent version of transformers, works fine with the latest