vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.33k stars 4.59k forks source link

[Bug]: INTEL GPU ARC 770 import vllm error #8565

Open adi-lb-phoenix opened 2 months ago

adi-lb-phoenix commented 2 months ago

Your current environment

The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ```

Model Input Dumps

No response

🐛 Describe the bug

Working on intel arc 770. vllm is not being imported, after building it using docker with the following commands

podman  build -f Dockerfile.xpu -t vllm-xpu-env  .

podman run  --device /dev/dri/ -it --rm vllm-xpu-env  bash
root@b1500aef7c39:/workspace/vllm# ls
CMakeLists.txt       Dockerfile.ppc64le  benchmarks        dist                     requirements-cpu.txt       requirements-test.txt
CODE_OF_CONDUCT.md   Dockerfile.rocm     build             docs                     requirements-cuda.txt      requirements-tpu.txt
CONTRIBUTING.md      Dockerfile.tpu      cmake             examples                 requirements-dev.txt       requirements-xpu.txt
Dockerfile           Dockerfile.xpu      collect_env.py    format.sh                requirements-lint.txt      setup.py
Dockerfile.cpu       LICENSE             collect_env.py.1  pyproject.toml           requirements-neuron.txt    tests
Dockerfile.neuron    MANIFEST.in         collect_env.py.2  requirements-build.txt   requirements-openvino.txt  vllm
Dockerfile.openvino  README.md           csrc              requirements-common.txt  requirements-rocm.txt      vllm.egg-info
root@b1500aef7c39:/workspace/vllm# cd
root@b1500aef7c39:~# vllm
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 33, in <module>
    sys.exit(load_entry_point('vllm==0.6.1.post2+xpu', 'console_scripts', 'vllm')())
  File "/usr/local/bin/vllm", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/__init__.py", line 3, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/engine/arg_utils.py", line 11, in <module>
    from vllm.config import (CacheConfig, ConfigFormat, DecodingConfig,
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/config.py", line 12, in <module>
    from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/model_executor/__init__.py", line 1, in <module>
    from vllm.model_executor.parameter import (BasevLLMParameter,
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/model_executor/parameter.py", line 7, in <module>
    from vllm.distributed import get_tensor_model_parallel_rank
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/distributed/__init__.py", line 1, in <module>
    from .communication_op import *
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/distributed/communication_op.py", line 6, in <module>
    from .parallel_state import get_tp_group
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/distributed/parallel_state.py", line 97, in <module>
    @torch.library.custom_op("vllm::inplace_all_reduce", mutates_args=["tensor"])
AttributeError: module 'torch.library' has no attribute 'custom_op'
root@b1500aef7c39:~# python
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import vllm
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/__init__.py", line 3, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/engine/arg_utils.py", line 11, in <module>
    from vllm.config import (CacheConfig, ConfigFormat, DecodingConfig,
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/config.py", line 12, in <module>
    from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/model_executor/__init__.py", line 1, in <module>
    from vllm.model_executor.parameter import (BasevLLMParameter,
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/model_executor/parameter.py", line 7, in <module>
    from vllm.distributed import get_tensor_model_parallel_rank
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/distributed/__init__.py", line 1, in <module>
    from .communication_op import *
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/distributed/communication_op.py", line 6, in <module>
    from .parallel_state import get_tp_group
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.1.post2+xpu-py3.10.egg/vllm/distributed/parallel_state.py", line 97, in <module>
    @torch.library.custom_op("vllm::inplace_all_reduce", mutates_args=["tensor"])
AttributeError: module 'torch.library' has no attribute 'custom_op'
>>> torch
<module 'torch' from '/usr/local/lib/python3.10/dist-packages/torch/__init__.py'>
>>> torch.library
<module 'torch.library' from '/usr/local/lib/python3.10/dist-packages/torch/library.py'>
>>> torch.library.custom_op
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'torch.library' has no attribute 'custom_op'

Before submitting a new issue...

vikyw89 commented 1 month ago

received same bug, but running on aws inf2 (neuron)


Traceback (most recent call last):
  File "/home/ubuntu/vllm/aws_neuron_venv_pytorch/bin/vllm", line 5, in <module>
    from vllm.scripts import main
  File "/home/ubuntu/vllm/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/__init__.py", line 3, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
  File "/home/ubuntu/vllm/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 11, in <module>
    from vllm.config import (CacheConfig, ConfigFormat, DecodingConfig,
  File "/home/ubuntu/vllm/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/config.py", line 12, in <module>
    from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
  File "/home/ubuntu/vllm/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/model_executor/__init__.py", line 1, in <module>
    from vllm.model_executor.parameter import (BasevLLMParameter,
  File "/home/ubuntu/vllm/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/model_executor/parameter.py", line 7, in <module>
    from vllm.distributed import get_tensor_model_parallel_rank
  File "/home/ubuntu/vllm/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/distributed/__init__.py", line 1, in <module>
    from .communication_op import *
  File "/home/ubuntu/vllm/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/distributed/communication_op.py", line 6, in <module>
    from .parallel_state import get_tp_group
  File "/home/ubuntu/vllm/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 98, in <module>
    @torch.library.custom_op("vllm::inplace_all_reduce", mutates_args=["tensor"])
AttributeError: module 'torch.library' has no attribute 'custom_op'```
jikunshang commented 1 month ago

here is fix for this issue https://github.com/vllm-project/vllm/pull/8557