Open testTech92 opened 8 months ago
I have exactly the same issue:
cupy_input._torch_dtype = torch_dtype # pylint: disable=protected-access AttributeError: 'ndarray' object has no attribute '_torch_dtype'
that's a problem of cupy,try to uninstall cupy*, then pip install cupy-cuda11x==12.1.0
if you are using CUDA 11.2 ~ 11.x
same issue here. When downgrading to cupy-cuda12x==12.1.0
I get
ImportError: NCCLBackend is not available. Please install cupy.
same issue here. When downgrading to
cupy-cuda12x==12.1.0
I getImportError: NCCLBackend is not available. Please install cupy.
Same issue here,
vllm-0.3.2+cu118-cp310-cp310-manylinux1_x86_64.whl accidently import cupy-cuda12x==12.1.0
during installing, even the enviroment is cuda11x
(installed with conda).
Fixed by pip install cupy-cuda11x==12.1
and python -m cupyx.tools.install_library --library nccl --cuda 11.x
.
It's frustrating that simply running pip install cupy-cuda11x==12.1
does not work for me. So I uninstall and reinstall, then it works.
that's a problem of cupy,try to uninstall cupy*, then
pip install cupy-cuda11x==12.1.0
if you are using CUDA 11.2 ~ 11.x
Great.
It worked with me: pip install cupy-cuda11x==12.1.0 if your cuda version is 11x
same issue here. When downgrading to
cupy-cuda12x==12.1.0
I getImportError: NCCLBackend is not available. Please install cupy.
Same issue here,
vllm-0.3.2+cu118-cp310-cp310-manylinux1_x86_64.whl accidently import
cupy-cuda12x==12.1.0
during installing, even the enviroment iscuda11x
(installed with conda).Fixed by
pip install cupy-cuda11x==12.1
andpython -m cupyx.tools.install_library --library nccl --cuda 11.x
.It's frustrating that simply running
pip install cupy-cuda11x==12.1
does not work for me. So I uninstall and reinstall, then it works.
This worked for me as well.
same issue here. When downgrading to
cupy-cuda12x==12.1.0
I getImportError: NCCLBackend is not available. Please install cupy.
Same issue here,
vllm-0.3.2+cu118-cp310-cp310-manylinux1_x86_64.whl accidently import
cupy-cuda12x==12.1.0
during installing, even the enviroment iscuda11x
(installed with conda).Fixed by
pip install cupy-cuda11x==12.1
andpython -m cupyx.tools.install_library --library nccl --cuda 11.x
.It's frustrating that simply running
pip install cupy-cuda11x==12.1
does not work for me. So I uninstall and reinstall, then it works.
It also worked for me as well. Thanks
I have successfully make it work with these commands:
export VLLM_VERSION=0.3.3
export PYTHON_VERSION=39
pip install https://github.com/vllm-project/vllm/releases/download/v$VLLM_VERSION/vllm-$VLLM_VERSION+cu118-cp$PYTHON_VERSION-cp$PYTHON_VERSION-manylinux1_x86_64.whl
pip uninstall xformers -y
pip install --upgrade xformers --index-url https://download.pytorch.org/whl/cu118
# VLLM 0.3.3 requires torch 2.1.2
pip uninstall torch -y
pip install torch==2.1.2 --upgrade --index-url https://download.pytorch.org/whl/cu118
pip uninstall cupy-cuda12x -y
pip install cupy-cuda11x==12.1
python -m cupyx.tools.install_library --library nccl --cuda 11.x
Although it alerts me this err, but I can ignore them.
vllm 0.3.3+cu118 requires cupy-cuda12x==12.1.0, which is not installed.
vllm 0.3.3+cu118 requires xformers==0.0.23.post1, but you have xformers 0.0.24+cu118 which is incompatible.
xformers 0.0.24+cu118 requires torch==2.2.0, but you have torch 2.1.2+cu118 which is incompatible.
I followed the above instructions, but always failed at python -m cupyx.tools.install_library --library nccl --cuda 11.x
due to shared libraries not found. The shared libraries nv*
and cu*
are installed but not in LD_LIBRARY_PATH as I find
them. I think this is because pip will not help you manage env vars. I decided to install most dependencies with mamba
(a faster conda
, you can just use conda
if you like). Here are my steps:
mamba create -n vllm python=3.10 -y
: Do NOT use python=3.11
for now since cupy=12.1
not support a slightly newer minor versions of python=3.11
like 3.11.1
mamba install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 cupy=12.1 cuda-version=12.1 -c pytorch -c nvidia -c conda-forge
: This installs compatible torch
and cupy
altogether.pip install xformers=="0.0.23.post1" --index-url https://download.pytorch.org/whl/cu121
: 0.0.23.post1
is the only version compatible with torch=2.1.2
python -m cupyx.tools.install_library --library nccl --cuda 12.x
pip install modelscope
Good luck in python deps hell
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
$ python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8001 --model Qwen1.5-14B-Chat-AWQ --tensor-parallel-size 2 --quantization awq --trust-remote-code --dtype half
INFO 02-26 10:32:53 api_server.py:229] args: Namespace(host='0.0.0.0', port=8061, allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=[''], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, root_path=None, middleware=[], model='Qwen1.5-14B-Chat-AWQ', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, download_dir=None, load_format='auto', dtype='half', kv_cache_dtype='auto', max_model_len=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, block_size=16, seed=0, swap_space=4, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization='awq', enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='cuda', engine_use_ray=False, disable_log_requests=False, max_log_len=None) WARNING 02-26 10:32:53 config.py:186] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. INFO 02-26 10:32:53 config.py:413] Custom all-reduce kernels are temporarily disabled due to stability issues. We will re-enable them once the issues are resolved. 2024-02-26 10:32:56,211 INFO worker.py:1724 -- Started a local Ray instance. INFO 02-26 10:32:57 llm_engine.py:79] Initializing an LLM engine with config: model='Qwen1.5-14B-Chat-AWQ', tokenizer='Qwen1.5-14B-Chat-AWQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=True, quantization=awq, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 237, in
engine = AsyncLLMEngine.from_engine_args(engine_args)
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 625, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 321, in init
self.engine = self._init_engine( args, *kwargs)
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in _init_engine
return engine_class(args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 118, in init
self._init_workers_ray(placement_group)
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 286, in _init_workers_ray
self._run_workers("init_model", cupy_port=get_open_port())
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1014, in _run_workers
driver_worker_output = getattr(self.driver_worker,
File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 94, in init_model
init_distributed_environment(self.parallel_config, self.rank,
File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 285, in init_distributed_environment
cupy_utils.all_reduce(torch.zeros(1).cuda())
File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/cupy_utils.py", line 110, in all_reduce
cupy_input._torch_dtype = torch_dtype # pylint: disable=protected-access
AttributeError: 'ndarray' object has no attribute '_torch_dtype'