Closed Bellk17 closed 2 months ago
I get same error
(vllm) ehartford@tw003:~/models/dolphin-2.9.2-qwen2-72b$ python -m vllm.entrypoints.openai.api_server --trust-remote-code --tensor-parallel-size 8 --model /home/ehartford/models/dolphin-2.9.2-qwen2-72b
INFO 05-27 20:36:35 config.py:569] Disabled the custom all-reduce kernel because it is not supported on AMD GPUs.
2024-05-27 20:36:37,825 INFO worker.py:1749 -- Started a local Ray instance.
INFO 05-27 20:36:40 llm_engine.py:103] Initializing an LLM engine (v0.4.2) with config: model='/home/ehartford/models/dolphin-2.9.2-qwen2-72b', speculative_config=None, tokenizer='/home/ehartford/models/dolphin-2.9.2-qwen2-72b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=/home/ehartford/models/dolphin-2.9.2-qwen2-72b)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 05-27 20:36:57 selector.py:56] Using ROCmFlashAttention backend.
(RayWorkerWrapper pid=2194398) INFO 05-27 20:36:57 selector.py:56] Using ROCmFlashAttention backend.
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] Traceback (most recent call last):
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] File "/home/ehartford/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm-0.4.2+rocm613-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 138, in execute_method
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] return executor(*args, **kwargs)
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] ^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] File "/home/ehartford/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm-0.4.2+rocm613-py3.11-linux-x86_64.egg/vllm/worker/worker.py", line 105, in init_device
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] torch.cuda.set_device(self.device)
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] File "/home/ehartford/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/cuda/__init__.py", line 404, in set_device
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] torch._C._cuda_setDevice(device)
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] RuntimeError: HIP error: invalid device ordinal
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] For debugging consider passing HIP_LAUNCH_BLOCKING=1.
(RayWorkerWrapper pid=2194149) ERROR 05-27 20:36:57 worker_base.py:146] Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
did you solve this?
Hello Eric, I just encountered the same issue as you. Have you resolved it?
did you solve this?
Did you solve this? Several of my machine encountered this issue at the same time.
did you solve this?
Did you solve this? Several of my machine encountered this issue at the same time.
Reinstalling the conda env can solve this.
did you solve this?
Did you solve this? Several of my machine encountered this issue at the same time.
Reinstalling the conda env can solve this.
how to do this?
did you solve this?
Did you solve this? Several of my machine encountered this issue at the same time.
Reinstalling the conda env can solve this.
I will try this if I encounter it again
Do you still have the issue? Please update.
Closing this issue as the issue should have been resolved. Please open a new one if you run into the similar issue again
Your current environment
🐛 Describe the bug
Issue with invalid ordinal when running with tp=2 on ROCm:
python benchmarks/benchmark_throughput.py --input-len=50 --output-len=100 --model=mistralai/Mistral-7B-v0.1 --tensor-parallel-size=2 --enforce-eager
This with building on latest main. Was hoping this was fixed with https://github.com/vllm-project/vllm/pull/3770, but no amount of environmental configuration has helped either (CUDA_VISIBLE_DEVICES, etc).