vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.25k stars 4.58k forks source link

ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory #1369

Closed DoliteMatheo closed 7 months ago

DoliteMatheo commented 1 year ago

When I used vllm to serve my local model, the terminal displayed the following message: ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory The traceback pointed to the following code in site-packages/vllm/utils.py and the execution of the single line could also trigger the same error:

"from vllm import cuda_utils"

I suppose it may be caused by the mismatch between vllm and my CUDA version or Pytorch version. The CUDA version is 12.2 (only this version installed) on my machine and installing a new version 11 is not so convenient, the Pytorch version is 2.1.0, vllm version is 0.2.0 How could I solve the problem without re-install CUDA 11? Many thanks!

alan1989 commented 1 year ago

I encountered the same problem, did you solve it?

DoliteMatheo commented 1 year ago

I encountered the same problem, did you solve it?

Unfortunately, I didn't find a better solution than installing CUDA 11, but I don't want to make any change about the CUDA version since the machine is not my private, and re-installing CUDA often cause many more unexpected problems. If you have got any solution, please tell me, much appreciated.

bhupendrathore commented 1 year ago

I tried with cuda 12.2. I get the same error, when trying with cuda 11.7 getting the following error : RuntimeError: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver.

i updated my xformers with pip install xformers==v0.0.22 and works fine.

i am using cuda11.7 docker image.

alan1989 commented 1 year ago

i have sloved it. first find the libcudart.so.11.0 path on your disk.then write it into LD_LIBRARY_PATH

locate libcudart.so.11.0 export LD_LIBRARY_PATH=(the libcudart.so.11.0 path you find):$LD_LIBRARY_PATH

vemonet commented 1 year ago

We are getting the same "error", but with CUDA 12.1

I am not sure who's fault is it, but throwing an error for the reason that "we cannot find a file we installed ourselves so we crash everything" is a bit ridiculous .

I did not see any restriction against using CUDA 12 in vLLM docs. So we can expect vLLM works for the latest CUDA version

Here is the code to reproduce (cf. below to see in which docker image to run this)

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import VLLM

llm = VLLM(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    max_new_tokens=8000,
    top_k=10,
    top_p=0.95,
    temperature=0.8,
)

conversation = ConversationChain(
    llm=llm, verbose=True, memory=ConversationBufferMemory()
)

print(conversation.predict(input="Hi mom!"))

Here is the full error we get:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/langchain/llms/vllm.py", line 79, in validate_environment
    from vllm import LLM as VLLModel
  File "/usr/local/lib/python3.10/dist-packages/vllm/__init__.py", line 3, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/arg_utils.py", line 6, in <module>
    from vllm.config import (CacheConfig, ModelConfig, ParallelConfig,
  File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 8, in <module>
    from vllm.utils import get_cpu_memory
  File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 8, in <module>
    from vllm import cuda_utils
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/share/code-llama/run_vllm.py", line 5, in <module>
    llm = VLLM(
  File "/usr/local/lib/python3.10/dist-packages/langchain/load/serializable.py", line 97, in __init__
    super().__init__(**kwargs)
  File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1102, in pydantic.main.validate_model
  File "/usr/local/lib/python3.10/dist-packages/langchain/llms/vllm.py", line 81, in validate_environment
    raise ImportError(
ImportError: Could not import vllm python package. Please install it with `pip install vllm`.

1. We use official nvidia images

Here is our setup: We are literally using the official CUDA image from nvidia: nvcr.io/nvidia/cuda:12.1.0-devel-ubuntu22.04

Starting from that the message "can't find libcudart" has no reason to exist

2. We made sure to have the right CUDA version

We don't use the pytorch one like recommended by vllm docs because with the pytorch one we can't control exactly which CUDA version gets installed, and then we get error like "muuuuh pytorch was compiled with a different CUDA version" . Also pytorch GPU image is like ~9G vs ~3G for CUDA

nvidia-smi shows we are using CUDA 12.1:

| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |

pip list | grep cuda shows we have the CUDA version 12.1 installed everywhere:

nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105

3. It was working on CUDA 12.1 last week 🫠

We managed to make it work last week when running in a old pytorch docker image that was still on py 3.8. But now it is broken when running on up-to-date images (what a mess), always complaining about this non-existing error with libcudart location

And last week when it was working the main GPU was still on CUDA 12.1 (according to nvidia-smi), but with some old 11.7 pip packages installed (as I say it's an old pytorch image running py3.8, what a wonderful mess, but it seems like vLLM is thriving in the mess since it's the only time it worked!):

cuda-python                   12.1.0rc5+1.gc7fd38c.dirty
cupy-cuda12x                  12.0.0b3
dask-cuda                     23.2.0
nvidia-cuda-cupti-cu11        11.7.101
nvidia-cuda-nvrtc-cu11        11.7.99
nvidia-cuda-runtime-cu11      11.7.99
nvidia-dali-cuda110           1.23.0

Meaning vLLM can work on CUDA 12 drivers, and we don't need to reinstall CUDA 11, only some CUDA 11 runtime libs should be good: nvidia-cuda-runtime-cu11, or nvidia-cuda-nvrtc-cu11, or nvidia-cuda-cupti-cu11

4. Trying the locate libcudart fix

Running locate libcudart.so.11.0 does not find anything inside the CUDA docker image. Because we have CUDA 12 installed I guess

5. Conclusion

CUDA version is really sensible for vLLM to work. It would be really helpful for vLLM to provide a bit of documentation around it, e.g.:

vemonet commented 1 year ago

A potential approach to fix it: it could be due to the torch version which is fixed to 2.0.1 : https://github.com/vllm-project/vllm/blob/main/pyproject.toml#L6

Because torch 2.0.1 does not have a variant for CUDA 12 (only CUDA 11) Maybe installing a new torch version should work

I'll try to re-build vllm without version limitations for torch to see if that helps

copasseron commented 1 year ago

A potential approach to fix it: it could be due to the torch version which is fixed to 2.0.1 : https://github.com/vllm-project/vllm/blob/main/pyproject.toml#L6

Because torch 2.0.1 does not have a variant for CUDA 12 (only CUDA 11) Maybe installing a new torch version should work

I'll try to re-build vllm without version limitations for torch to see if that helps

let me know if you've got any news here.

I've got the same problem since this morning, with nvidia image as well nvcr.io/nvidia/tritonserver:23.09-py3.

vemonet commented 1 year ago

It's weird because the latest vllm release actually uses torch >= 2.0.0, so I should be able to use torch 2.1.0 with vllm 0.2.0.

But installing vllm always installs torch 2.0.1, and it due to:

xformers 0.0.22 requires torch==2.0.1, but you have torch 2.1.0 which is incompatible.

If we try to pip install --upgrade xformers:

vllm 0.2.0 requires xformers==0.0.22, but you have xformers 0.0.22.post4 which is incompatible.

But the requirements.txt of release v0.2.0 indicates xformers >= 0.0.22

And whatever combination tried I am always gettings errors, most of the time this one:

INFO 10-16 16:23:57 llm_engine.py:72] Initializing an LLM engine with config: model='mistralai/Mistral-7B-Instruct-v0.1', tokenizer='mistralai/Mistral-7B-Instruct-v0.1', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/workspace/share/code-llama/run_vllm.py", line 5, in <module>
    llm = VLLM(
  File "/usr/local/lib/python3.10/dist-packages/langchain/load/serializable.py", line 97, in __init__
    super().__init__(**kwargs)
  File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1102, in pydantic.main.validate_model
  File "/usr/local/lib/python3.10/dist-packages/langchain/llms/vllm.py", line 86, in validate_environment
    values["client"] = VLLModel(
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 93, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 231, in from_engine_args
    engine = cls(*engine_configs,
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 110, in __init__
    self._init_workers(distributed_init_method)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 128, in _init_workers
    from vllm.worker.worker import Worker  # pylint: disable=import-outside-toplevel
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 10, in <module>
    from vllm.model_executor import get_model, InputMetadata, set_random_seed
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/__init__.py", line 2, in <module>
    from vllm.model_executor.model_loader import get_model
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 10, in <module>
    from vllm.model_executor.models import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/__init__.py", line 1, in <module>
    from vllm.model_executor.models.aquila import AquilaForCausalLM
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/aquila.py", line 35, in <module>
    from vllm.model_executor.layers.attention import PagedAttentionWithRoPE
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/attention.py", line 10, in <module>
    from vllm import attention_ops
ImportError: /usr/local/lib/python3.10/dist-packages/vllm/attention_ops.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl27throw_data_ptr_access_errorEv
WoosukKwon commented 1 year ago

If I understand the problem correctly, the issue was because v0.2.0 didn't fix the pytorch and xformers version. In v0.2.1 which was released today, we pinned their versions. So the error should not happen as long as you use CUDA 11.8.

We will support CUDA 12 once xformers releases a new stable version with CUDA 12 support. (While xformers==0.0.22.post4 seems to include CUDA 12 binaries, I feel it's a bit unstable at the moment).

gesanqiu commented 1 year ago

If I understand the problem correctly, the issue was because v0.2.0 didn't fix the pytorch and xformers version. In v0.2.1 which was released today, we pinned their versions. So the error should not happen as long as you use CUDA 11.8.

We will support CUDA 12 once xformers releases a new stable version with CUDA 12 support. (While xformers==0.0.22.post4 seems to include CUDA 12 binaries, I feel it's a bit unstable at the moment).

Right now Pytorch 2.0.1 is binded to CUDA 11.7, compile vLLM with CUDA 11.8 will get fialed, just the same issue like #1283 I use nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 install vLLM with pip install -e . succeed.

DoliteMatheo commented 1 year ago

If I understand the problem correctly, the issue was because v0.2.0 didn't fix the pytorch and xformers version. In v0.2.1 which was released today, we pinned their versions. So the error should not happen as long as you use CUDA 11.8.

We will support CUDA 12 once xformers releases a new stable version with CUDA 12 support. (While xformers==0.0.22.post4 seems to include CUDA 12 binaries, I feel it's a bit unstable at the moment).

Finally I installed CUDA 11.7 manually and the problem got fixed immediately. It seemed that vllm cannot work if there is only CUDA 12 installed on the machine.

s-natsubori commented 1 year ago

I encountered the same problem,

and I solved this error with add requirements.txt xformers==0.0.22

xformers==0.0.22 requires nvidia-cuda-runtime-cu11==11.7.99 and etc. Unfortunately it uninstall PyTorch2.1.0(originally installed) but my code is working!!

bitsnaps commented 1 year ago

I'm getting the same error on colab with TheBloke-Dolphin-2.1-mistral-7B-GPTQ which was working...

sanjana-sudo commented 1 year ago

I'm getting the same error on colab with TheBloke-Dolphin-2.1-mistral-7B-GPTQ which was working...

@bitsnaps same problem here. did you find any solution?

bitsnaps commented 1 year ago

I'm getting the same error on colab with TheBloke-Dolphin-2.1-mistral-7B-GPTQ which was working...

@bitsnaps same problem here. did you find any solution?

Not yet, I believe this is something to do with mistral/transformer/huggingface issue (not vllm), I'm not even able to run mistral-7b on colab which was working fine last week.

sanjana-sudo commented 1 year ago

I'm getting the same error on colab with TheBloke-Dolphin-2.1-mistral-7B-GPTQ which was working...

@bitsnaps same problem here. did you find any solution?

Not yet, I believe this is something to do with mistral/transformer/huggingface issue (not vllm), I'm not even able to run mistral-7b on colab which was working fine last week.

@bitsnaps I tried to run the Mistral_7B_Instruct_v0_1_GGUF now and its working. I just downgraded gradio to gradio==3.32.0 and did not change anything related to flash-attn.

s-natsubori commented 1 year ago

Currently, AutoAWQ deliver two versions. (cuda11 and cuda12) I recommend that you try a combination of both to suit your environment. and chek other pakeges too.

pip install autoawq (torch 2.1.0 + CUDA 12.1.1)

from github(torch20 + cuda11) pip install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl

D-Octopus commented 11 months ago

Have a go at updating vllm to v0.2.2. Looks like they've sorted out this issue in that version. v0.2.2 Major changes Upgrade to CUDA 12 #1527

LI-ZHAODONG commented 11 months ago

I'm using llmware library and was facing the same error. I upgraded toch (2.0.1 -> 2.1.0) and solved the prob.

ibnzahoor98 commented 9 months ago

pip install xformers==v0.0.22

Thank you! Worked like a charm!

Provemj commented 4 weeks ago

It's probably just a cuda or torch version problem, try downgrading it