Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
[ ] docker / docker
[X] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装
Version info / 版本信息
0.14.1
The command used to start Xinference / 用以启动 xinference 的命令
xinference-local --host 0.0.0.0 --port 9997
Reproduction / 复现过程
1.curl -X 'POST' 'http://192.168.1.88:9997/v1/rerank' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
"model": "jina-reranker-v2",
"query": "A man is eating pasta.",
"documents": [
"A man is eating food.",
"A man is eating a piece of bread.",
"The girl is carrying a baby.",
"A man is riding a horse.",
"A woman is playing violin."]
}' -w "\n时间总计: %{time_total} 秒\n"
2.会报错
{"detail":"[address=0.0.0.0:40587, pid=69485] CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n"}
时间总计: 0.325 秒
3.日志
ther API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Traceback (most recent call last):
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1223, in rerank
scores = await model.rerank(
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send
return self._process_result_message(result)
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 656, in send
result = await self._run_coro(message.message_id, coro)
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 367, in _run_coro
return await coro
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 558, in on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive
result = await result
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, kwargs)
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 90, in wrapped_func
ret = await fn(self, *args, *kwargs)
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 591, in rerank
return await self._call_wrapper_json(
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 398, in _call_wrapper_json
return await self._call_wrapper("json", fn, args, kwargs)
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 114, in _async_wrapper
return await fn(*args, kwargs)
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 415, in _call_wrapper
ret = await asyncio.to_thread(fn, *args, *kwargs)
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(self.args, self.kwargs)
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/rerank/core.py", line 207, in rerank
empty_cache()
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/device_utils.py", line 94, in empty_cache
torch.cuda.empty_cache()
File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/torch/cuda/memory.py", line 170, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: [address=0.0.0.0:43151, pid=69266] CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
System Info / 系統信息
centos 7.9 python 3.10.6 Package Version
absl-py 2.1.0 accelerate 0.33.0 aiobotocore 2.7.0 aiofiles 23.2.1 aiohttp 3.9.5 aioitertools 0.11.0 aioprometheus 23.12.0 aiosignal 1.3.1 alembic 1.13.2 aliyun-python-sdk-core 2.15.1 aliyun-python-sdk-kms 2.16.3 altair 5.3.0 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 asttokens 2.4.1 async-lru 2.0.4 async-timeout 4.0.3 attrs 23.2.0 audioread 3.0.1 autopage 0.5.2 Babel 2.15.0 bcrypt 4.2.0 beautifulsoup4 4.12.3 bibtexparser 2.0.0b7 bleach 6.1.0 botocore 1.31.64 certifi 2024.7.4 cffi 1.16.0 cfgv 3.4.0 charset-normalizer 3.3.2 chattts 0.1.1 click 8.1.7 cliff 4.7.0 clldutils 3.22.2 cloudpickle 3.0.0 cmaes 0.10.0 cmd2 2.4.3 colorama 0.4.6 coloredlogs 15.0.1 colorlog 6.8.2 comm 0.2.2 conformer 0.3.2 contourpy 1.2.1 crcmod 1.7 cryptography 43.0.0 csvw 3.3.0 cycler 0.12.1 Cython 3.0.10 debugpy 1.8.2 decorator 5.1.1 defusedxml 0.7.1 diffusers 0.25.0 diskcache 5.6.3 distlib 0.3.8 distro 1.9.0 dlinfo 1.2.1 ecdsa 0.19.0 editdistance 0.8.1 einops 0.8.0 einx 0.3.0 encodec 0.1.1 exceptiongroup 1.2.2 executing 2.0.1 fastapi 0.110.3 fastjsonschema 2.20.0 ffmpy 0.3.2 filelock 3.15.4 flatbuffers 24.3.25 fonttools 4.53.1 fqdn 1.5.1 frozendict 2.4.4 frozenlist 1.4.1 fsspec 2023.10.0 funasr 1.1.4 gdown 5.2.0 gradio 4.26.0 gradio_client 0.15.1 greenlet 3.0.3 grpcio 1.65.1 h11 0.14.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.24.2 humanfriendly 10.0 hydra-colorlog 1.2.0 hydra-core 1.3.2 hydra-optuna-sweeper 1.2.0 HyperPyYAML 1.2.2 identify 2.6.0 idna 3.7 importlib_metadata 8.2.0 importlib_resources 6.4.0 inflect 7.3.1 iniconfig 2.0.0 ipykernel 6.29.5 ipython 8.26.0 ipywidgets 8.1.3 isodate 0.6.1 isoduration 20.11.0 jaconv 0.4.0 jamo 0.4.1 jedi 0.19.1 jieba 0.42.1 Jinja2 3.1.4 jmespath 0.10.0 joblib 1.4.2 json5 0.9.25 jsonpointer 3.0.0 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyter-events 0.10.0 jupyter-lsp 2.2.5 jupyter_server 2.14.2 jupyter_server_terminals 0.5.3 jupyterlab 4.2.4 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.3 jupyterlab_widgets 3.0.11 kaldiio 2.18.0 kiwisolver 1.4.5 language-tags 1.2.0 lazy_loader 0.4 librosa 0.10.2.post1 lightning 2.3.3 lightning-utilities 0.11.6 llvmlite 0.43.0 lxml 5.2.2 Mako 1.3.5 Markdown 3.6 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matcha-tts 0.0.6.0 matplotlib 3.9.1 matplotlib-inline 0.1.7 mdurl 0.1.2 mistune 3.0.2 modelscope 1.16.1 more-itertools 10.3.0 mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 nest-asyncio 1.6.0 networkx 3.3 nodeenv 1.9.1 notebook 7.2.1 notebook_shim 0.2.4 numba 0.60.0 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 omegaconf 2.3.0 onnxruntime-gpu 1.16.0 openai 1.37.1 openai-whisper 20231117 opencv-contrib-python 4.10.0.84 optuna 2.10.1 orjson 3.10.6 oss2 2.18.6 overrides 7.7.0 packaging 24.1 pandas 2.2.2 pandocfilters 1.5.1 parso 0.8.4 passlib 1.7.4 pbr 6.0.0 peft 0.12.0 pexpect 4.9.0 phonemizer 3.2.1 pillow 10.4.0 pip 23.3.1 platformdirs 4.2.2 pluggy 1.5.0 pooch 1.8.2 pre-commit 3.7.1 prettytable 3.10.2 prometheus_client 0.20.0 prompt_toolkit 3.0.47 protobuf 4.25.4 psutil 6.0.0 ptyprocess 0.7.0 pure_eval 0.2.3 pyarrow 17.0.0 pyasn1 0.6.0 pybase16384 0.3.7 pycparser 2.22 pycryptodome 3.20.0 pydantic 2.8.2 pydantic_core 2.20.1 pydub 0.25.1 Pygments 2.18.0 pylatexenc 2.10 pynini 2.1.5 pynndescent 0.5.13 pynvml 11.5.3 pyparsing 3.1.2 pyperclip 1.9.0 PySocks 1.7.1 pytest 8.3.2 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-jose 3.3.0 python-json-logger 2.0.7 python-multipart 0.0.9 pytorch-lightning 2.3.3 pytorch-wpe 0.0.1 pytz 2024.1 PyYAML 6.0.1 pyzmq 26.0.3 quantile-python 1.1 rdflib 7.0.0 referencing 0.35.1 regex 2024.7.24 requests 2.32.3 rfc3339-validator 0.1.4 rfc3986 1.5.0 rfc3986-validator 0.1.1 rich 13.7.1 rootutils 1.0.7 rpds-py 0.19.1 rsa 4.9 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 ruff 0.5.5 s3fs 2023.10.0 safetensors 0.4.3 scikit-learn 1.5.1 scipy 1.14.0 seaborn 0.13.2 segments 2.2.1 semantic-version 2.10.0 Send2Trash 1.8.3 sentence-transformers 3.0.1 sentencepiece 0.2.0 setuptools 68.2.2 shellingham 1.5.4 six 1.16.0 sniffio 1.3.1 soundfile 0.12.1 soupsieve 2.5 soxr 0.4.0 SQLAlchemy 2.0.31 sse-starlette 2.1.2 stack-data 0.6.3 starlette 0.37.2 stevedore 5.2.0 sympy 1.13.1 tabulate 0.9.0 tblib 3.0.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 tensorboardX 2.6.2.2 terminado 0.18.1 threadpoolctl 3.5.0 tiktoken 0.7.0 timm 1.0.7 tinycss2 1.3.0 tn 0.0.4 tokenizers 0.19.1 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.4.0 torch-complex 0.4.4 torchaudio 2.4.0 torchmetrics 1.4.0.post0 torchvision 0.19.0 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 transformers 4.43.3 triton 3.0.0 typeguard 4.3.0 typer 0.11.1 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.2 tzdata 2024.1 umap-learn 0.5.6 Unidecode 1.3.8 uri-template 1.3.0 uritemplate 4.1.1 urllib3 2.0.7 uvicorn 0.30.3 uvloop 0.19.0 vector-quantize-pytorch 1.15.6 virtualenv 20.26.3 vocos 0.1.0 wcwidth 0.2.13 webcolors 24.6.0 webencodings 0.5.1 websocket-client 1.8.0 websockets 11.0.3 Werkzeug 3.0.3 WeTextProcessing 1.0.3 wget 3.2 wheel 0.41.2 whisper 1.1.10 widgetsnbextension 4.0.11 wrapt 1.16.0 xinference 0.14.1 xinference-client 0.14.1 xoscar 0.3.2 xxhash 3.4.1 yarl 1.9.4 zipp 3.19.2
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
0.14.1
The command used to start Xinference / 用以启动 xinference 的命令
xinference-local --host 0.0.0.0 --port 9997
Reproduction / 复现过程
1.curl -X 'POST' 'http://192.168.1.88:9997/v1/rerank' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "model": "jina-reranker-v2", "query": "A man is eating pasta.", "documents": [ "A man is eating food.", "A man is eating a piece of bread.", "The girl is carrying a baby.", "A man is riding a horse.", "A woman is playing violin."] }' -w "\n时间总计: %{time_total} 秒\n"
2.会报错 {"detail":"[address=0.0.0.0:40587, pid=69485] CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1\nCompile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.\n"} 时间总计: 0.325 秒3.日志 ther API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions. Traceback (most recent call last): File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1223, in rerank scores = await model.rerank( File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 656, in send result = await self._run_coro(message.message_id, coro) File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 367, in _run_coro return await coro File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, kwargs) File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 90, in wrapped_func ret = await fn(self, *args, *kwargs) File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 591, in rerank return await self._call_wrapper_json( File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 398, in _call_wrapper_json return await self._call_wrapper("json", fn, args, kwargs) File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 114, in _async_wrapper return await fn(*args, kwargs) File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 415, in _call_wrapper ret = await asyncio.to_thread(fn, *args, *kwargs) File "/home/njue/anaconda3/envs/xinference/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/home/njue/anaconda3/envs/xinference/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/rerank/core.py", line 207, in rerank empty_cache() File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/xinference/device_utils.py", line 94, in empty_cache torch.cuda.empty_cache() File "/home/njue/anaconda3/envs/xinference/lib/python3.10/site-packages/torch/cuda/memory.py", line 170, in empty_cache torch._C._cuda_emptyCache() RuntimeError: [address=0.0.0.0:43151, pid=69266] CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile withTORCH_USE_CUDA_DSA
to enable device-side assertions.Expected behavior / 期待表现
正常调用