Closed JPonsa closed 2 months ago
same error
I'm seeing this error in 0.4.1 and 0.4.2 but for me it only occurs when trying to load a LORA. For example:
python3 -m vllm.entrypoints.openai.api_server --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
-tp 2 --dtype=half --gpu-memory-utilization 0.90
works, but this doesn't:
python3 -m vllm.entrypoints.openai.api_server --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
-tp 2 --dtype=half --gpu-memory-utilization 0.90 \
--enable-lora --lora-dtype=float16 --lora-modules tinylora=/model/TinyLlama/TinyLlama-1.1B-Chat-v1.0/loras/tinylora
I've tried torch 2.1/2.2/2.3, including installation from source, and triton 2.1/2.2/2.3. I'm running in AWS ECS on a p3.8xlarge, the GPU is Tesla V100-SXM2-16GB, and the docker image is based on nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
.
P100 is not supported. lora is not supported in V100.
P100 is not supported. lora is not supported in V100.
a6000 or 2080?which one can support
@youkaichao I tested v0.4.1 with V100 at some point and it did work, so I'm not sure why it's suddenly not supported.
Looking at the history of v0.4.1, was v0.4.1 released on the 18th with https://github.com/vllm-project/vllm/commit/221d93ecbf51102df69deaf153d35df6d93370f6, then re-released on the 24th including all changes up to https://github.com/vllm-project/vllm/commit/468d761b32e3b3c5d64eeaa797e54ab809b7e50c (which is likely what broke compatibility)?
P100 is not supported. lora is not supported in V100. any plans to support V100?when? @youkaichao
same error,same error stack
LoRA adapters, any plans to support V100?@youkaichao
Your current environment
🐛 Describe the bug
Getting RuntimeError: CUDA error: no kernel image is available for execution on the device when running a process with mistral7b. I don't have this issue if I run it with TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ. I don't fully understand why it is happening. I have been able to run other processes with mistral7b
Error