Closed rookielyb closed 6 months ago
lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.1 LTS Release: 20.04 Codename: focal
cat /proc/version Linux version 5.4.0-126-generic (buildd@lcy02-amd64-072) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) https://github.com/vllm-project/vllm/pull/142-Ubuntu SMP Fri Aug 26 12:12:57 UTC 2022
could you check that the problem still exits after rebuilding the repo (pip install -e .
)?
could you check that the problem still exits after rebuilding the repo (
pip install -e .
)? pip install -e . : Building wheels for collected packages: vllm Building editable for vllm (pyproject.toml) ... done Created wheel for vllm: filename=vllm-0.1.2-0.editable-cp310-cp310-linux_x86_64.whl size=8465 sha256=8154890edc8a5b3b0100d83308d973ec25ecee72b02d479023542312fee2fd1d Stored in directory: /tmp/pip-ephem-wheel-cache-vg3isffo/wheels/33/fc/d6/f27b3ac96c14477426ab8fd6d5573e139cf29c857e206d16a3 Successfully built vllm Installing collected packages: vllm Attempting uninstall: vllm Found existing installation: vllm 0.1.2 Uninstalling vllm-0.1.2: Successfully uninstalled vllm-0.1.2 Successfully installed vllm-0.1.2
CUDA_VISIBLE_DEVICES=0 python offline_inference.py:
File "/home/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/cg/vllm/vllm/modelexecutor/models/opt.py", line 102, in forward
output, = self.out_proj(attn_output)
File "/home/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/cg/vllm/vllm/model_executor/parallel_utils/tensorparallel/layers.py", line 443, in forward
output = output + self.bias if self.bias is not None else output_
RuntimeError: CUDA error: no kernel image is available for execution on the device
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Processed prompts: 0%| | 0/4 [00:00<?, ?it/s]
still not resolved
I have the same issue. Same nvcc/driver both 11.7 When running: python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-chat-hf
This solution works for me.
could you check that the problem still exits after rebuilding the repo (
pip install -e .
)?
I met same issue. The problem is I could run vllm on V100 with cuda 11.3, while can not run on A100 with cuda 12.0. I used exact same codes and docker, except cuda.
I have the same issue. Same nvcc/driver both 11.7 When running:
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-chat-hf
This solution works for me.could you check that the problem still exits after rebuilding the repo (
pip install -e .
)?
I use A100 A40 T4 and use langchain to integrate vllm to encounter this problem in cuda11.7, but I am normal on RTX4090 and RTX3090
Here is the exception output:
INFO 08-25 10:41:25 llm_engine.py:70] Initializing an LLM engine with config: model='meta-llama/Llama-2-7b-chat-hf', tokenizer='meta-llama/Llama-2 -7b-chat-hf', tokenizer_mode=auto, trust_remote_code=False, dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)
INFO 08-25 10:41:25 tokenizer.py:29] For some LLaMA-based models, initializing the fast tokenizer may take a long time. To eliminate the initialization time, consider using 'hf-internal-testing/llama-tokenizer ' instead of the original tokenizer.
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/miniconda3/lib/python3.10/site-packages/vllm/entrypoints/api_server.py", line 78, in TORCH_USE_CUDA_DSA
to enable device-side assertions.
same the model meta-llama/Llama-2-7b-chat-hf
how to fix?
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/parallel_utils/tensor_parallel/layers.py", line 309, in forward
output_parallel = F.linear(input_parallel, self.weight, bias)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
root@fa4b127ca2bf:/workspace# python
Also running into this same issue. Has anyone found a fix?
Same issue +1! ANY updates or issues? Anyone tried updating Driver Version: 470.141.03 to 515?
Same issue +1! ANY updates or issues? Anyone tried updating Driver Version: 470.141.03 to 515?
Solved, by recompiling and reinstalling the lib when deploying on V100. Previously it was compiled on A100.
I need to switch between several GPUs (A100, V100, RTX8000) and cuda version, when the CUDA version changes, I need to reinstall it from source. It's a shot-time solution, but it works now!
same issue
root@server-1:~/vllm# python3 -m vllm.entrypoints.api_server --model /workspace/Qwen-7B/weights/Qwen-7B-Chat/ --trust-remote-code
2023-09-25 06:17:03,382 INFO worker.py:1642 -- Started a local Ray instance.
INFO 09-25 06:17:04 llm_engine.py:72] Initializing an LLM engine with config: model='/workspace/Qwen-7B/weights/Qwen-7B-Chat/', tokenizer='/workspace/Qwen-7B/weights/Qwen-7B-Chat/', tokenizer_mode=auto, trust_remote_code=True, dtype=torch.float16, download_dir=None, load_format=auto, tensor_parallel_size=4, seed=0)
WARNING 09-25 06:17:05 tokenizer.py:66] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/vllm/vllm/entrypoints/api_server.py", line 177, in
I use cuda118 docker,builded in A800 , The Same Docker copy to H100 GPU, When it restart, there was this issue, The problem persists after rebuild
I've been having the same issue for the last week.
GPU: Titan XP with CUDA 12.0 nvcc -V == 11.8 torch.version.cuda == 11.8
I've tried reinstalling PyTorch, which did not resolve the problem.
I've been having the same issue for the last week.
GPU: Titan XP with CUDA 12.0 nvcc -V == 11.8 torch.version.cuda == 11.8
I've tried reinstalling PyTorch, which did not resolve the problem.
@schnurromafia I don't think Titan Xp is supported, since its compute capacity is 6.1. One of vLLM's requirements is GPU with compute capability 7.0 or higher.
@Fr4nk1inCs thanks for the message!
There is not technical limitation to running vllm with CC < 7.0 (see https://github.com/vllm-project/vllm/issues/963#issuecomment-1714100911), apart from not being able to load some models like Falcon. The workaround is to build from source and comment out a couple lines: https://github.com/vllm-project/vllm/issues/463#issuecomment-1636070685
I've been able to run vllm without having this issue for weeks. Reverting back to old commits does not resolve it, which probably means that vllm is not responsible for this error. Just curious if anyone else has had this happen to them...
@schnurromafia Thanks for the message! I was also trying to run vLLM on Pascal GPUs. I'll build it from source and see if it works.
is pytorch '1.10.1+cu111' ok ?
@hmellor What resolved this issue?
Solved, by recompiling and reinstalling the lib when deploying on V100. Previously it was compiled on A100.
I was going through stale issues (no activity in over 3 months) and this one looked to be resolved due to comments such as the one quoted above.
Since vLLM changes so fast, and nobody has encountered this issue in over 3 months, it seemed reasonable to close this issue. If somebody encounters it again, they can open a new issue using the new issue templates making the bug report more actionable.
@pseudotensor are you currently experiencing this issue?
I am also experiencing this using both T4 and V100 GPU's on Colab
@cccx3 in that case, could you please open a new issue using the new templates so we can better understand the cause of your issue?
I am also experiencing this using both T4 and V100 GPU's on Colab
@cccx3 Could you tell me how you resolved it, if you did? I'm also trying to run it on colab
@cccx3 in that case, could you please open a new issue using the new templates so we can better understand the cause of your issue?
I also encountered the same problem
CUDA kernel failed : no kernel image is available for execution on the device void prescan_small(int , int , int, int, CUstream_st *) at L:126 in C:\Users\reall\Softwares\Miniconda3\envs\Wonder3D_Projects\torchmcubes\cxx\pscan.cu Having this when working with 3d in ComfyUI
same error
same error
Please open a new issue and provide your own error trace. This one is very old and might not have the same cause.
same error, worked on V100 and fails on P100
Same issue +1! ANY updates or issues? Anyone tried updating Driver Version: 470.141.03 to 515?
Solved, by recompiling and reinstalling the lib when deploying on V100. Previously it was compiled on A100.
I think this is the current solution to this error. Recompiling and reinstalling the vllm works for me.
Error: RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.nvcc -V Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0
conda list: cudatoolkit-dev 11.7.0 cudatoolkit 11.7.0 torch 2.0.1+cu117
nvidia-smi A100 80G NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4
how to solve this problem? thanks!