Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
System Info / 系統信息
cuda:12.4 use pip to install
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
Xinference version 0.16.3
The command used to start Xinference / 用以启动 xinference 的命令
xinference-supervisor -H "192.168.1.8"
xinference-worker -e "192.168.1.8:9997" -H "192.168.1.8"
Reproduction / 复现过程
(注册本地模型、运行模型等操作都是在webui里进行的) (LLM、rerank、embedding模型都会报错)
1.当我用在本地模式启动的时候,一切工作正常:xinference-local --host 0.0.0.0 --port 9997,work properly.
2.当我用集群模式启动的时候,运行本地模型会报错,model not found
xinference-supervisor -H "192.168.1.8"
xinference-worker -e "192.168.1.8:9997" -H "192.168.1.8"
两种模式的操作一模一样,搞不懂了
报错信息如下:
2024-11-29 08:50:11,224 xinference.core.worker 93395 INFO You specify to launch the model: bge-rerank on GPU index: [0] of the worker: 192.168.1.8:15654, xinference will automatically ignore the
n_gpu
option. 2024-11-29 08:50:11,604 xinference.core.worker 93395 ERROR Failed to load model bge-rerank-co-0 Traceback (most recent call last): File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/core/worker.py", line 869, in launch_builtin_model model, model_description = await asyncio.to_thread( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/model/core.py", line 113, in create_model_instance return create_rerank_model_instance( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/model/rerank/core.py", line 380, in create_rerank_model_instance raise ValueError( ValueError: Rerank model bge-rerank not found, availableHuggingface: dict_keys(['bge-reranker-large', 'bge-reranker-base', 'bce-reranker-base_v1', 'bge-reranker-v2-m3', 'bge-reranker-v2-gemma', 'bge-reranker-v2-minicpm-layerwise', 'jina-reranker-v2', 'minicpm-reranker'])ModelScope: dict_keys(['bge-reranker-base', 'bge-reranker-large', 'bce-reranker-base_v1', 'bge-reranker-v2-m3', 'bge-reranker-v2-gemma', 'bge-reranker-v2-minicpm-layerwise', 'minicpm-reranker']) 2024-11-29 08:50:11,611 xinference.core.worker 93395 ERROR [request e32f8fa0-adeb-11ef-be2c-d843aede8148] Leave launch_builtin_model, error: Rerank model bge-rerank not found, availableHuggingface: dict_keys(['bge-reranker-large', 'bge-reranker-base', 'bce-reranker-base_v1', 'bge-reranker-v2-m3', 'bge-reranker-v2-gemma', 'bge-reranker-v2-minicpm-layerwise', 'jina-reranker-v2', 'minicpm-reranker'])ModelScope: dict_keys(['bge-reranker-base', 'bge-reranker-large', 'bce-reranker-base_v1', 'bge-reranker-v2-m3', 'bge-reranker-v2-gemma', 'bge-reranker-v2-minicpm-layerwise', 'minicpm-reranker']), elapsed time: 0 s Traceback (most recent call last): File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/core/utils.py", line 78, in wrapped ret = await func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/core/worker.py", line 869, in launch_builtin_model model, model_description = await asyncio.to_thread( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/model/core.py", line 113, in create_model_instance return create_rerank_model_instance( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lark-dev/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/model/rerank/core.py", line 380, in create_rerank_model_instance raise ValueError( ValueError: Rerank model bge-rerank not found, availableHuggingface: dict_keys(['bge-reranker-large', 'bge-reranker-base', 'bce-reranker-base_v1', 'bge-reranker-v2-m3', 'bge-reranker-v2-gemma', 'bge-reranker-v2-minicpm-layerwise', 'jina-reranker-v2', 'minicpm-reranker'])ModelScope: dict_keys(['bge-reranker-base', 'bge-reranker-large', 'bce-reranker-base_v1', 'bge-reranker-v2-m3', 'bge-reranker-v2-gemma', 'bge-reranker-v2-minicpm-layerwise', 'minicpm-reranker'])Expected behavior / 期待表现
在集群田间下使用方法和本地模式一样正常运行