Closed DoliteMatheo closed 7 months ago
I encountered the same problem, did you solve it?
I encountered the same problem, did you solve it?
Unfortunately, I didn't find a better solution than installing CUDA 11, but I don't want to make any change about the CUDA version since the machine is not my private, and re-installing CUDA often cause many more unexpected problems. If you have got any solution, please tell me, much appreciated.
I tried with cuda 12.2. I get the same error, when trying with cuda 11.7 getting the following error :
RuntimeError: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver.
i updated my xformers with pip install xformers==v0.0.22 and works fine.
i am using cuda11.7 docker image.
i have sloved it. first find the libcudart.so.11.0 path on your disk.then write it into LD_LIBRARY_PATH
locate libcudart.so.11.0 export LD_LIBRARY_PATH=(the libcudart.so.11.0 path you find):$LD_LIBRARY_PATH
We are getting the same "error", but with CUDA 12.1
I am not sure who's fault is it, but throwing an error for the reason that "we cannot find a file we installed ourselves so we crash everything" is a bit ridiculous .
I did not see any restriction against using CUDA 12 in vLLM docs. So we can expect vLLM works for the latest CUDA version
Here is the code to reproduce (cf. below to see in which docker image to run this)
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import VLLM
llm = VLLM(
model="mistralai/Mistral-7B-Instruct-v0.1",
max_new_tokens=8000,
top_k=10,
top_p=0.95,
temperature=0.8,
)
conversation = ConversationChain(
llm=llm, verbose=True, memory=ConversationBufferMemory()
)
print(conversation.predict(input="Hi mom!"))
Here is the full error we get:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/langchain/llms/vllm.py", line 79, in validate_environment
from vllm import LLM as VLLModel
File "/usr/local/lib/python3.10/dist-packages/vllm/__init__.py", line 3, in <module>
from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/arg_utils.py", line 6, in <module>
from vllm.config import (CacheConfig, ModelConfig, ParallelConfig,
File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 8, in <module>
from vllm.utils import get_cpu_memory
File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 8, in <module>
from vllm import cuda_utils
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workspace/share/code-llama/run_vllm.py", line 5, in <module>
llm = VLLM(
File "/usr/local/lib/python3.10/dist-packages/langchain/load/serializable.py", line 97, in __init__
super().__init__(**kwargs)
File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
File "pydantic/main.py", line 1102, in pydantic.main.validate_model
File "/usr/local/lib/python3.10/dist-packages/langchain/llms/vllm.py", line 81, in validate_environment
raise ImportError(
ImportError: Could not import vllm python package. Please install it with `pip install vllm`.
Here is our setup: We are literally using the official CUDA image from nvidia: nvcr.io/nvidia/cuda:12.1.0-devel-ubuntu22.04
Starting from that the message "can't find libcudart" has no reason to exist
We don't use the pytorch one like recommended by vllm docs because with the pytorch one we can't control exactly which CUDA version gets installed, and then we get error like "muuuuh pytorch was compiled with a different CUDA version" . Also pytorch GPU image is like ~9G vs ~3G for CUDA
nvidia-smi
shows we are using CUDA 12.1:
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
pip list | grep cuda
shows we have the CUDA version 12.1 installed everywhere:
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
We managed to make it work last week when running in a old pytorch docker image that was still on py 3.8. But now it is broken when running on up-to-date images (what a mess), always complaining about this non-existing error with libcudart location
And last week when it was working the main GPU was still on CUDA 12.1 (according to nvidia-smi
), but with some old 11.7 pip packages installed (as I say it's an old pytorch image running py3.8, what a wonderful mess, but it seems like vLLM is thriving in the mess since it's the only time it worked!):
cuda-python 12.1.0rc5+1.gc7fd38c.dirty
cupy-cuda12x 12.0.0b3
dask-cuda 23.2.0
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-dali-cuda110 1.23.0
Meaning vLLM can work on CUDA 12 drivers, and we don't need to reinstall CUDA 11, only some CUDA 11 runtime libs should be good: nvidia-cuda-runtime-cu11, or nvidia-cuda-nvrtc-cu11, or nvidia-cuda-cupti-cu11
locate libcudart
fixRunning locate libcudart.so.11.0
does not find anything inside the CUDA docker image. Because we have CUDA 12 installed I guess
CUDA version is really sensible for vLLM to work. It would be really helpful for vLLM to provide a bit of documentation around it, e.g.:
A potential approach to fix it: it could be due to the torch
version which is fixed to 2.0.1 : https://github.com/vllm-project/vllm/blob/main/pyproject.toml#L6
Because torch
2.0.1 does not have a variant for CUDA 12 (only CUDA 11)
Maybe installing a new torch
version should work
I'll try to re-build vllm
without version limitations for torch to see if that helps
A potential approach to fix it: it could be due to the
torch
version which is fixed to 2.0.1 : https://github.com/vllm-project/vllm/blob/main/pyproject.toml#L6Because
torch
2.0.1 does not have a variant for CUDA 12 (only CUDA 11) Maybe installing a newtorch
version should workI'll try to re-build
vllm
without version limitations for torch to see if that helps
let me know if you've got any news here.
I've got the same problem since this morning, with nvidia image as well nvcr.io/nvidia/tritonserver:23.09-py3
.
It's weird because the latest vllm release actually uses torch >= 2.0.0
, so I should be able to use torch 2.1.0 with vllm 0.2.0.
But installing vllm always installs torch 2.0.1, and it due to:
xformers 0.0.22 requires torch==2.0.1, but you have torch 2.1.0 which is incompatible.
If we try to pip install --upgrade xformers
:
vllm 0.2.0 requires xformers==0.0.22, but you have xformers 0.0.22.post4 which is incompatible.
But the requirements.txt
of release v0.2.0
indicates xformers >= 0.0.22
And whatever combination tried I am always gettings errors, most of the time this one:
INFO 10-16 16:23:57 llm_engine.py:72] Initializing an LLM engine with config: model='mistralai/Mistral-7B-Instruct-v0.1', tokenizer='mistralai/Mistral-7B-Instruct-v0.1', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "/workspace/share/code-llama/run_vllm.py", line 5, in <module>
llm = VLLM(
File "/usr/local/lib/python3.10/dist-packages/langchain/load/serializable.py", line 97, in __init__
super().__init__(**kwargs)
File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
File "pydantic/main.py", line 1102, in pydantic.main.validate_model
File "/usr/local/lib/python3.10/dist-packages/langchain/llms/vllm.py", line 86, in validate_environment
values["client"] = VLLModel(
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 93, in __init__
self.llm_engine = LLMEngine.from_engine_args(engine_args)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 231, in from_engine_args
engine = cls(*engine_configs,
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 110, in __init__
self._init_workers(distributed_init_method)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 128, in _init_workers
from vllm.worker.worker import Worker # pylint: disable=import-outside-toplevel
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 10, in <module>
from vllm.model_executor import get_model, InputMetadata, set_random_seed
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/__init__.py", line 2, in <module>
from vllm.model_executor.model_loader import get_model
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 10, in <module>
from vllm.model_executor.models import * # pylint: disable=wildcard-import
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/__init__.py", line 1, in <module>
from vllm.model_executor.models.aquila import AquilaForCausalLM
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/aquila.py", line 35, in <module>
from vllm.model_executor.layers.attention import PagedAttentionWithRoPE
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/attention.py", line 10, in <module>
from vllm import attention_ops
ImportError: /usr/local/lib/python3.10/dist-packages/vllm/attention_ops.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl27throw_data_ptr_access_errorEv
If I understand the problem correctly, the issue was because v0.2.0 didn't fix the pytorch and xformers version. In v0.2.1 which was released today, we pinned their versions. So the error should not happen as long as you use CUDA 11.8.
We will support CUDA 12 once xformers
releases a new stable version with CUDA 12 support. (While xformers==0.0.22.post4
seems to include CUDA 12 binaries, I feel it's a bit unstable at the moment).
If I understand the problem correctly, the issue was because v0.2.0 didn't fix the pytorch and xformers version. In v0.2.1 which was released today, we pinned their versions. So the error should not happen as long as you use CUDA 11.8.
We will support CUDA 12 once
xformers
releases a new stable version with CUDA 12 support. (Whilexformers==0.0.22.post4
seems to include CUDA 12 binaries, I feel it's a bit unstable at the moment).
Right now Pytorch 2.0.1 is binded to CUDA 11.7, compile vLLM with CUDA 11.8 will get fialed, just the same issue like #1283
I use nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04
install vLLM with pip install -e .
succeed.
If I understand the problem correctly, the issue was because v0.2.0 didn't fix the pytorch and xformers version. In v0.2.1 which was released today, we pinned their versions. So the error should not happen as long as you use CUDA 11.8.
We will support CUDA 12 once
xformers
releases a new stable version with CUDA 12 support. (Whilexformers==0.0.22.post4
seems to include CUDA 12 binaries, I feel it's a bit unstable at the moment).
Finally I installed CUDA 11.7 manually and the problem got fixed immediately. It seemed that vllm cannot work if there is only CUDA 12 installed on the machine.
I encountered the same problem,
nvcr.io/nvidia/pytorch:22.12-py3
vllm==0.2.0
and I solved this error with
add requirements.txt
xformers==0.0.22
xformers==0.0.22
requires nvidia-cuda-runtime-cu11==11.7.99
and etc.
Unfortunately it uninstall PyTorch2.1.0(originally installed)
but my code is working!!
I'm getting the same error on colab with TheBloke-Dolphin-2.1-mistral-7B-GPTQ
which was working...
I'm getting the same error on colab with
TheBloke-Dolphin-2.1-mistral-7B-GPTQ
which was working...
@bitsnaps same problem here. did you find any solution?
I'm getting the same error on colab with
TheBloke-Dolphin-2.1-mistral-7B-GPTQ
which was working...@bitsnaps same problem here. did you find any solution?
Not yet, I believe this is something to do with mistral/transformer/huggingface issue (not vllm), I'm not even able to run mistral-7b on colab which was working fine last week.
I'm getting the same error on colab with
TheBloke-Dolphin-2.1-mistral-7B-GPTQ
which was working...@bitsnaps same problem here. did you find any solution?
Not yet, I believe this is something to do with mistral/transformer/huggingface issue (not vllm), I'm not even able to run mistral-7b on colab which was working fine last week.
@bitsnaps I tried to run the Mistral_7B_Instruct_v0_1_GGUF now and its working. I just downgraded gradio to gradio==3.32.0
and did not change anything related to flash-attn.
Currently, AutoAWQ deliver two versions. (cuda11 and cuda12) I recommend that you try a combination of both to suit your environment. and chek other pakeges too.
pip install autoawq (torch 2.1.0 + CUDA 12.1.1)
from github(torch20 + cuda11)
pip install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl
Have a go at updating vllm to v0.2.2. Looks like they've sorted out this issue in that version. v0.2.2 Major changes Upgrade to CUDA 12 #1527
I'm using llmware library and was facing the same error. I upgraded toch (2.0.1 -> 2.1.0) and solved the prob.
pip install xformers==v0.0.22
Thank you! Worked like a charm!
It's probably just a cuda or torch version problem, try downgrading it
When I used vllm to serve my local model, the terminal displayed the following message: ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory The traceback pointed to the following code in site-packages/vllm/utils.py and the execution of the single line could also trigger the same error:
"from vllm import cuda_utils"
I suppose it may be caused by the mismatch between vllm and my CUDA version or Pytorch version. The CUDA version is 12.2 (only this version installed) on my machine and installing a new version 11 is not so convenient, the Pytorch version is 2.1.0, vllm version is 0.2.0 How could I solve the problem without re-install CUDA 11? Many thanks!