Open HumerousGorgon opened 1 week ago
Vllm 0.6.2 still lacks the XPU implementation of enc_dec_model_runner and cross-attention operator, so the Llama-3.2 11B Vision
model is not yet supported on 0.6.2.
Thought it might be something like this; thank you for your response.
I did note that some conversation was happening about a 'next release' of IPEX-LLM, sometime in November. Do you have any more information on this?
We have just released vllm 0.6.2 two days ago. And for the Llama-3.2 11B Vision
model, we need to rely on the support of XPU on the main branch of vllm. Currently, the main branch does not implement enc_dec_model_runner and cross-attention operator, so it will take a long time for Llama-3.2 11B Vision
to be supported.
Hello!
I see that vLLM got updated in the latest version of IPEX-LLM and so decided to try using it with Llama-3.2-11B-Vision, however I seem to get errors each time:
2024-11-13 14:41:58,608 ERROR worker.py:422 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WrapperWithLoadBit.execute_method() (pid=2977, ip=192.168.86.58, actor_id=8c3afd038f22587b3ffad84501000000, repr=<ipex_llm.vllm.xpu.ipex_llm_wrapper.WrapperWithLoadBit object at 0x78242c31b290>) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/worker/worker_base.py", line 465, in execute_method raise e File "/usr/local/lib/python3.11/dist-packages/vllm/worker/worker_base.py", line 456, in execute_method return executor(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/worker/xpu_worker.py", line 128, in determine_num_available_blocks self.model_runner.profile_run() File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/worker/xpu_model_runner.py", line 538, in profile_run self.execute_model(model_input, kv_caches, intermediate_tensors) File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/worker/xpu_model_runner.py", line 643, in execute_model hidden_or_intermediate_states = model_executable( ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/model_executor/models/mllama.py", line 1075, in forward attn_metadata.encoder_seq_lens_tensor != 0).reshape(-1, 1).to( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'IpexAttnMetadata' object has no attribute 'encoder_seq_lens_tensor'
Obviously, I don't expect that this gets patched immediately, just looking to see if I am doing something wrong. I fully expect that a new version of IPEX-LLM is coming soon with full support for Llama vision models.
Thanks!