xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.7k stars 368 forks source link

Failed start when base image from pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel to vllm/vllm-openai:latest #1875

Open huzech opened 1 month ago

huzech commented 1 month ago

System Info / 系統信息

cuda 12.0 xprobe/xinference:v0.13.1 docker image from dockerhub

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

v0.13.1

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

start container show the error msg: 2024-07-16 Traceback (most recent call last): 2024-07-16 File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama_cpp.py", line 75, in _load_shared_library 2024-07-16 return ctypes.CDLL(str(_lib_path), *cdll_args) # type: ignore 2024-07-16 File "/usr/lib/python3.10/ctypes/init.py", line 374, in init 2024-07-16 self._handle = _dlopen(self._name, mode) 2024-07-16 OSError: libcublas.so.12: cannot open shared object file: No such file or directory 2024-07-16 2024-07-16 During handling of the above exception, another exception occurred: 2024-07-16 2024-07-16 Traceback (most recent call last): 2024-07-16 File "/usr/local/bin/xinference-local", line 5, in 2024-07-16 from xinference.deploy.cmdline import local 2024-07-16 File "/usr/local/lib/python3.10/dist-packages/xinference/init.py", line 38, in 2024-07-16 _install() 2024-07-16 File "/usr/local/lib/python3.10/dist-packages/xinference/init.py", line 35, in _install 2024-07-16 install_model() 2024-07-16 File "/usr/local/lib/python3.10/dist-packages/xinference/model/init.py", line 17, in _install 2024-07-16 from .llm import _install as llm_install 2024-07-16 File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/init.py", line 20, in 2024-07-16 from .core import ( 2024-07-16 File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/core.py", line 26, in 2024-07-16 from ...types import PeftModelConfig 2024-07-16 File "/usr/local/lib/python3.10/dist-packages/xinference/types.py", line 399, in 2024-07-16 from llama_cpp import Llama 2024-07-16 File "/usr/local/lib/python3.10/dist-packages/llama_cpp/init.py", line 1, in 2024-07-16 from .llama_cpp import 2024-07-16 File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama_cpp.py", line 88, in 2024-07-16 _lib = _load_shared_library(_lib_base_name) 2024-07-16 File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama_cpp.py", line 77, in _load_shared_library 2024-07-16 raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}") 2024-07-16 RuntimeError: Failed to load shared library '/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libllama.so': libcublas.so.12: cannot open shared object file: No such file or directory

Expected behavior / 期待表现

start container success

qinxuye commented 1 month ago

You need to upgrade your cuda version to 12.4 at least.

huzech commented 1 month ago

Thank you for your response. Upgrading CUDA is not a simple task. Is it possible to keep two versions of the Docker image?

qinxuye commented 1 month ago

If you want to insist on the older image, you can pull the older image, and use `pip install xinference==0.13.1' to upgrade inside the docker.

huzech commented 1 month ago

If you want to insist on the older image, you can pull the older image, and use `pip install xinference==0.13.1' to upgrade inside the docker.

Can start the server, but can not run gemma-2, a error msg "need to upgrade TensorFlow"

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.