salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.61k stars 941 forks source link

unable to run it on 16GB M1 in CPU MODE: RuntimeError: "slow_conv2d_cpu" not implemented for 'Half' #556

Open dataf3l opened 10 months ago

dataf3l commented 10 months ago

is there a quantized version somewhere?

code:

import torch
import requests
from PIL import Image
from transformers import Blip2Processor, Blip2ForConditionalGeneration

processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b", torch_dtype=torch.float16, device_map="auto")

model.to("cpu")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

question = "how many dogs are in the picture?"
inputs = processor(raw_image, question, return_tensors="pt").to("cpu", torch.float16)

out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True).strip())%

error:

(venv) ➜  image-captioning-v2 python captionit2.py
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:30<00:00, 15.46s/it]
Traceback (most recent call last):
  File "/Users/b/study/ml/image-captioning-v2/captionit2.py", line 17, in <module>
    out = model.generate(**inputs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/transformers/models/blip_2/modeling_blip_2.py", line 1850, in generate
    image_embeds = self.vision_model(pixel_values, return_dict=True).last_hidden_state
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/transformers/models/blip_2/modeling_blip_2.py", line 548, in forward
    hidden_states = self.embeddings(pixel_values)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/transformers/models/blip_2/modeling_blip_2.py", line 112, in forward
    patch_embeds = self.patch_embedding(pixel_values.to(dtype=target_dtype))  # shape = [*, width, grid, grid]
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b/study/ml/image-captioning-v2/venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

I don't know what to do, I deleted all my movies to download the model, it didn't work :( sadness all around.

morteza102030 commented 9 months ago

i have the same problem