qwen-vl RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

KiwiHana commented 5 months ago

https://github.com/intel-analytics/BigDL/tree/7a1a9edca7ebc1c02e155828f8475eb51cb8c06b/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl

bigdl2.5.0b20240311 CPU memory:32G ADL+A770 因为32G内存无法加载模型，加载到9/10会killed，所以改成以下代码后能加载模型，但报错。 model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cpu", trust_remote_code=True, fp16=True)

-------------------- Session 1 -------------------- Please input a picture: /home/test/Qwen/BigDL/examples/apple.jpeg Please enter the text: what is this? Traceback (most recent call last): File "/home/test/Qwen/BigDL/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/chat.py", line 92, in response, history = model.chat(tokenizer, query = query, history = history) File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 947, in chat outputs = self.generate( File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 1066, in generate return super().generate( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1588, in generate return self.sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2642, in sample outputs = self( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 856, in forward transformer_outputs = self.transformer( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 565, in forward images = self.visual.encode(images) File "/root/.cache/huggingface/modules/transformers_modules/visual.py", line 426, in encode return self(images) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/visual.py", line 410, in forward x = self.attn_pool(x) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/visual.py", line 148, in forward out = self.attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/activation.py", line 1241, in forward attn_output, attn_output_weights = F.multi_head_attention_forward( File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 5300, in multi_head_attention_forward q, k, v = _in_projection_packed(query, key, value, in_proj_weight, in_proj_bias) File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 4846, in _in_projection_packed return linear(q, w_q, b_q), linear(k, w_k, b_k), linear(v, w_v, b_v) RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

lalalapotter commented 5 months ago

Run the example normally and cannot get same error with following python package list:

Click to unfold the pip list

``` Package Version ----------------------------- ------------------ absl-py 2.1.0 accelerate 0.21.0 aiohttp 3.9.1 aiosignal 1.3.1 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 async-timeout 4.0.3 attrs 23.2.0 audioread 3.0.1 bigdl-core-xe-21 2.5.0b20240311 bigdl-core-xe-esimd-21 2.5.0b20240311 bigdl-llm 2.5.0b20240311 certifi 2023.11.17 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 contourpy 1.2.0 cycler 0.12.1 datasets 2.16.1 decorator 5.1.1 dill 0.3.7 einops 0.7.0 evaluate 0.4.1 filelock 3.13.1 fonttools 4.49.0 frozenlist 1.4.1 fsspec 2023.10.0 grpcio 1.62.1 huggingface-hub 0.17.3 idna 3.6 importlib_metadata 7.0.2 importlib_resources 6.3.0 intel-extension-for-pytorch 2.1.10+xpu intel-openmp 2024.0.2 Jinja2 3.1.2 jiwer 3.0.3 joblib 1.3.2 kiwisolver 1.4.5 lazy_loader 0.3 librosa 0.10.1 llvmlite 0.41.1 Markdown 3.5.2 MarkupSafe 2.1.3 matplotlib 3.8.3 mpmath 1.3.0 msgpack 1.0.7 multidict 6.0.4 multiprocess 0.70.15 networkx 3.2.1 numba 0.58.1 numpy 1.26.3 omegaconf 2.3.0 packaging 23.2 pandas 2.1.4 pillow 10.2.0 pip 23.3.1 platformdirs 4.1.0 pooch 1.8.0 protobuf 4.25.1 psutil 5.9.7 py-cpuinfo 9.0.0 pyarrow 14.0.2 pyarrow-hotfix 0.6 pycparser 2.21 pydantic 2.5.3 pydantic_core 2.14.6 pyparsing 3.1.2 python-dateutil 2.8.2 pytz 2023.3.post1 PyYAML 6.0.1 rapidfuzz 3.6.1 regex 2023.12.25 requests 2.31.0 responses 0.18.0 safetensors 0.4.1 scikit-learn 1.3.2 scipy 1.11.4 sentencepiece 0.1.99 setuptools 68.2.2 six 1.16.0 soundfile 0.12.1 soxr 0.3.7 sympy 1.12 tabulate 0.9.0 tensorboard 2.16.2 tensorboard-data-server 0.7.2 thefuzz 0.20.0 threadpoolctl 3.2.0 tiktoken 0.5.2 tokenizers 0.13.3 torch 2.1.0a0+cxx11.abi torchvision 0.16.0a0+cxx11.abi tqdm 4.66.1 transformers 4.31.0 transformers-stream-generator 0.0.4 typing_extensions 4.9.0 tzdata 2023.4 urllib3 2.1.0 Werkzeug 3.0.1 wheel 0.41.2 xxhash 3.4.1 yarl 1.9.4 zipp 3.17.0 ```

Click to unfold the system settings

``` System: ubuntu 22.04.3 CPU: i9 13900K GPU: arc A770 Memory: 64GB(DDR5 5600) Storage: 360GB ```

Sync python package list offline and will try to rerun example again after reply.

GUOGUO-lab commented 5 months ago

here is the pip list: @lalalapotter @KiwiHana

Package                       Version
----------------------------- --------------------
absl-py                       2.0.0
accelerate                    0.21.0
addict                        2.4.0
aiohttp                       3.9.1
aiosignal                     1.3.1
annotated-types               0.6.0
antlr4-python3-runtime        4.9.3
apturl                        0.5.2
async-timeout                 4.0.3
attrs                         23.2.0
audioread                     3.0.1
bcrypt                        3.2.0
bigdl-core-xe-21              2.5.0b20240311
bigdl-core-xe-esimd-21        2.5.0b20240311
bigdl-llm                     2.5.0b20240311
bitsandbytes                  0.43.0
blinker                       1.4
Brlapi                        0.8.3
cachetools                    5.3.2
certifi                       2020.6.20
cffi                          1.16.0
chardet                       4.0.0
charset-normalizer            3.3.2
click                         8.1.7
colorama                      0.4.4
command-not-found             0.3
contourpy                     1.2.0
cryptography                  3.4.8
cupshelpers                   1.0
cycler                        0.12.1
datasets                      2.16.1
dbus-python                   1.2.18
decorator                     5.1.1
defer                         1.0.6
defusedxml                    0.7.1
dill                          0.3.7
distro                        1.7.0
distro-info                   1.1build1
duplicity                     0.8.21
einops                        0.7.0
evaluate                      0.4.1
fasteners                     0.14.1
filelock                      3.13.1
fonttools                     4.45.0
frozenlist                    1.4.1
fsspec                        2023.10.0
future                        0.18.2
google-auth                   2.23.4
google-auth-oauthlib          1.1.0
grpcio                        1.59.3
httplib2                      0.20.2
huggingface-hub               0.21.4
idna                          3.3
importlib-metadata            4.6.4
intel-extension-for-pytorch   2.1.10+xpu
intel-openmp                  2024.0.2
iotop                         0.6
jeepney                       0.7.1
Jinja2                        3.1.2
jiwer                         3.0.3
joblib                        1.3.2
jstyleson                     0.0.2
keyring                       23.5.0
kiwisolver                    1.4.5
language-selector             0.1
launchpadlib                  1.10.16
lazr.restfulclient            0.14.4
lazr.uri                      1.0.6
lazy_loader                   0.3
librosa                       0.10.1
llvmlite                      0.41.1
lockfile                      0.12.2
louis                         3.20.0
macaroonbakery                1.3.1
Mako                          1.1.3
Markdown                      3.5.1
MarkupSafe                    2.1.3
matplotlib                    3.8.3
monotonic                     1.6
more-itertools                8.10.0
mpmath                        1.3.0
msgpack                       1.0.7
multidict                     6.0.4
multiprocess                  0.70.15
netifaces                     0.11.0
networkx                      3.2.1
numba                         0.58.1
numpy                         1.26.3
nvidia-cublas-cu12            12.1.3.1
nvidia-cuda-cupti-cu12        12.1.105
nvidia-cuda-nvrtc-cu12        12.1.105
nvidia-cuda-runtime-cu12      12.1.105
nvidia-cudnn-cu12             8.9.2.26
nvidia-cufft-cu12             11.0.2.54
nvidia-curand-cu12            10.3.2.106
nvidia-cusolver-cu12          11.4.5.107
nvidia-cusparse-cu12          12.1.0.106
nvidia-nccl-cu12              2.18.1
nvidia-nvjitlink-cu12         12.3.101
nvidia-nvtx-cu12              12.1.105
oauthlib                      3.2.0
olefile                       0.46
omegaconf                     2.3.0
onnx                          1.15.0
opencv-python                 4.8.1.78
openvino                      2023.2.0
openvino-dev                  2023.2.0
openvino-telemetry            2023.2.1
packaging                     23.2
pandas                        2.1.4
paramiko                      2.9.3
pexpect                       4.8.0
pillow                        10.2.0
pip                           23.3.1
platformdirs                  4.1.0
pooch                         1.8.0
protobuf                      4.25.1
psutil                        5.9.7
ptyprocess                    0.7.0
py-cpuinfo                    9.0.0
pyarrow                       14.0.2
pyarrow-hotfix                0.6
pyasn1                        0.5.1
pyasn1-modules                0.3.0
pycairo                       1.20.1
pycparser                     2.21
pycups                        2.0.1
pydantic                      2.5.3
pydantic_core                 2.14.6
PyGObject                     3.42.1
PyJWT                         2.3.0
pymacaroons                   0.13.0
PyNaCl                        1.5.0
pyparsing                     2.4.7
pyRFC3339                     1.1
python-apt                    2.4.0+ubuntu1
python-dateutil               2.8.2
python-debian                 0.1.43ubuntu1
pytz                          2022.1
pyxdg                         0.27
PyYAML                        5.4.1
rapidfuzz                     3.6.1
regex                         2023.12.25
reportlab                     3.6.8
requests                      2.31.0
requests-oauthlib             1.3.1
responses                     0.18.0
rsa                           4.9
safetensors                   0.4.2
scikit-learn                  1.3.2
scipy                         1.11.4
seaborn                       0.13.0
SecretStorage                 3.3.1
sentencepiece                 0.1.99
setuptools                    59.6.0
six                           1.16.0
soundfile                     0.12.1
soxr                          0.3.7
ssh-import-id                 5.11
sympy                         1.12
systemd-python                234
tabulate                      0.9.0
tensorboard                   2.16.2
tensorboard-data-server       0.7.2
texttable                     1.7.0
thefuzz                       0.20.0
thop                          0.1.1.post2209072238
threadpoolctl                 3.2.0
tiktoken                      0.5.2
tokenizers                    0.13.3
torch                         2.1.0a0+cxx11.abi
torchvision                   0.16.0a0+cxx11.abi
tqdm                          4.66.1
transformers                  4.31.0
transformers-stream-generator 0.0.4
triton                        2.1.0
typing_extensions             4.8.0
tzdata                        2023.3
ubuntu-advantage-tools        8001
ubuntu-drivers-common         0.0.0
ufw                           0.36.1
unattended-upgrades           0.1
urllib3                       1.26.5
usb-creator                   0.3.7
wadllib                       1.3.6
Werkzeug                      3.0.1
wheel                         0.41.2
xdg                           5
xkit                          0.0.0
xxhash                        3.4.1
yarl                          1.9.4
zipp                          3.17.0

KiwiHana commented 5 months ago

修改以下两句代码后，cpu 32G memory可以运行了。但是同样的图片和文字，这种方式的回答结果跟示例有差距，检测物体框还报错了。 model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cpu", trust_remote_code=True, fp16=True) model = optimize_model(model, low_bit='sym_int4', modules_to_not_convert=['c_fc', 'out_proj']) model = model.half().to('xpu')

2024-03-15 13:12:27,135 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [02:55<00:00, 17.57s/it]
2024-03-15 13:15:23,291 - INFO - Converting the current model to sym_int4 format......
-------------------- Session 1 --------------------
 Please input a picture: http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg
 Please enter the text: 这是什么？
---------- Response ----------
这张照片似乎是在太空中拍摄的地球，但实际上是由于视觉错觉效果，许多玩具熊被聚集在一起形成了一个圆形的图案。

-------------------- Session 2 --------------------
 Please input a picture:
 Please enter the text: 这个小女孩多大了？
---------- Response ----------
根据图片内容，这个小女孩似乎只有幼儿园或者小学年龄，具体年龄不得而知。

-------------------- Session 3 --------------------
 Please input a picture:
 Please enter the text: 在图中检测框出玩具熊
Traceback (most recent call last):
  File "/home/a770/kiwi/qwen-vl/chat.py", line 92, in <module>
    response, history = model.chat(tokenizer, query = query, history = history)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 947, in chat
    outputs = self.generate(
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 1066, in generate
    return super().generate(
  File "/home/a770/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
    return self.sample(
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/transformers/generation/utils.py", line 2642, in sample
    outputs = self(
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 856, in forward
    transformer_outputs = self.transformer(
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 565, in forward
    images = self.visual.encode(images)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/visual.py", line 420, in encode
    image = Image.open(requests.get(image_path, stream=True).raw)
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/PIL/Image.py", line 3309, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f0f16fca750>

KiwiHana commented 5 months ago

如果使用INT4： model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, optimize_model=True,load_in_4bit=True).eval() 不能加载模型，报错如下：

(qwen-vl) a770@RPLP-A770:~/kiwi/qwen-vl$ python chat.py
/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/a770/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-03-15 13:22:09,567 - INFO - intel_extension_for_pytorch auto imported
Traceback (most recent call last):
  File "/home/a770/kiwi/qwen-vl/chat.py", line 41, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, optimize_model=True,load_in_4bit=True).eval()
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
    return model_class.from_pretrained(
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2297, in from_pretrained
    raise RuntimeError("No GPU found. A GPU is needed for quantization.")
RuntimeError: No GPU found. A GPU is needed for quantization.

lalalapotter commented 5 months ago

修改以下两句代码后，cpu 32G memory可以运行了。但是同样的图片和文字，这种方式的回答结果跟示例有差距，检测物体框还报错了。 model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cpu", trust_remote_code=True, fp16=True) model = optimize_model(model, low_bit='sym_int4', modules_to_not_convert=['c_fc', 'out_proj']) model = model.half().to('xpu')

2024-03-15 13:12:27,135 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [02:55<00:00, 17.57s/it]
2024-03-15 13:15:23,291 - INFO - Converting the current model to sym_int4 format......
-------------------- Session 1 --------------------
 Please input a picture: http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg
 Please enter the text: 这是什么？
---------- Response ----------
这张照片似乎是在太空中拍摄的地球，但实际上是由于视觉错觉效果，许多玩具熊被聚集在一起形成了一个圆形的图案。

-------------------- Session 2 --------------------
 Please input a picture:
 Please enter the text: 这个小女孩多大了？
---------- Response ----------
根据图片内容，这个小女孩似乎只有幼儿园或者小学年龄，具体年龄不得而知。

-------------------- Session 3 --------------------
 Please input a picture:
 Please enter the text: 在图中检测框出玩具熊
Traceback (most recent call last):
  File "/home/a770/kiwi/qwen-vl/chat.py", line 92, in <module>
    response, history = model.chat(tokenizer, query = query, history = history)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 947, in chat
    outputs = self.generate(
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 1066, in generate
    return super().generate(
  File "/home/a770/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
    return self.sample(
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/transformers/generation/utils.py", line 2642, in sample
    outputs = self(
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 856, in forward
    transformer_outputs = self.transformer(
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 565, in forward
    images = self.visual.encode(images)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/visual.py", line 420, in encode
    image = Image.open(requests.get(image_path, stream=True).raw)
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/PIL/Image.py", line 3309, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f0f16fca750>

For the Pillow error you may try following workaround:

# replace original code: image = Image.open(requests.get(image_path, stream=True).raw)
image = Image.open(io.BytesIO(requests.get(image_path, stream=True).content))

or you can check the version of following packages

pillow                        10.2.0
requests                      2.31.0
requests-oauthlib             1.3.1

Will further look into the abnormal reply.

lalalapotter commented 5 months ago

如果使用INT4： model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, optimize_model=True,load_in_4bit=True).eval() 不能加载模型，报错如下：

(qwen-vl) a770@RPLP-A770:~/kiwi/qwen-vl$ python chat.py
/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/a770/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-03-15 13:22:09,567 - INFO - intel_extension_for_pytorch auto imported
Traceback (most recent call last):
  File "/home/a770/kiwi/qwen-vl/chat.py", line 41, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, optimize_model=True,load_in_4bit=True).eval()
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
    return model_class.from_pretrained(
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2297, in from_pretrained
    raise RuntimeError("No GPU found. A GPU is needed for quantization.")
RuntimeError: No GPU found. A GPU is needed for quantization.

You may try following command to check if there is GPU available.

import intel_extension_for_pytorch as ipex
import torch

torch.xpu.is_available()

KiwiHana commented 5 months ago

修改以下两句代码后，cpu 32G memory可以运行了。但是同样的图片和文字，这种方式的回答结果跟示例有差距，检测物体框还报错了。 model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cpu", trust_remote_code=True, fp16=True) model = optimize_model(model, low_bit='sym_int4', modules_to_not_convert=['c_fc', 'out_proj']) model = model.half().to('xpu')

2024-03-15 13:12:27,135 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [02:55<00:00, 17.57s/it]
2024-03-15 13:15:23,291 - INFO - Converting the current model to sym_int4 format......
-------------------- Session 1 --------------------
 Please input a picture: http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg
 Please enter the text: 这是什么？
---------- Response ----------
这张照片似乎是在太空中拍摄的地球，但实际上是由于视觉错觉效果，许多玩具熊被聚集在一起形成了一个圆形的图案。

-------------------- Session 2 --------------------
 Please input a picture:
 Please enter the text: 这个小女孩多大了？
---------- Response ----------
根据图片内容，这个小女孩似乎只有幼儿园或者小学年龄，具体年龄不得而知。

-------------------- Session 3 --------------------
 Please input a picture:
 Please enter the text: 在图中检测框出玩具熊
Traceback (most recent call last):
  File "/home/a770/kiwi/qwen-vl/chat.py", line 92, in <module>
    response, history = model.chat(tokenizer, query = query, history = history)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 947, in chat
    outputs = self.generate(
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 1066, in generate
    return super().generate(
  File "/home/a770/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
    return self.sample(
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/transformers/generation/utils.py", line 2642, in sample
    outputs = self(
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 856, in forward
    transformer_outputs = self.transformer(
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/a770/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 565, in forward
    images = self.visual.encode(images)
  File "/home/a770/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/visual.py", line 420, in encode
    image = Image.open(requests.get(image_path, stream=True).raw)
  File "/home/a770/miniconda3/envs/qwen-vl/lib/python3.10/site-packages/PIL/Image.py", line 3309, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f0f16fca750>

For the Pillow error you may try following workaround:

# replace original code: image = Image.open(requests.get(image_path, stream=True).raw)
image = Image.open(io.BytesIO(requests.get(image_path, stream=True).content))

or you can check the version of following packages

pillow                        10.2.0
requests                      2.31.0
requests-oauthlib             1.3.1

Will further look into the abnormal reply.

根据提供的方法升级相关库，已解决，感谢！

intel-analytics / ipex-llm

qwen-vl RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half #10381