xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.54k stars 455 forks source link

运行嵌入模型报错Remote server 192.0.0.181:44667 closed #2579

Open minglong-huang opened 5 days ago

minglong-huang commented 5 days ago

System Info / 系統信息

Linux vllm=0.5.2 python=3.10.0 CUDA Version: 12.0

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

xinference=0.15.4

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

代码如下

from xinference.client import Client
client = Client("http://192.0.0.181:9997")
list_models_run = client.list_models()
model_uid = list_models_run['bge-m3']['id']
embedding_client = client.get_model(model_uid)

text_lsit = 文本块list #每个文本块小于5K字
text_list_len = len(text_list)
step = 100
for index in range(0, text_list_len, step):
    text_embeddings = embedding_client.create_embedding(text_list[index:index + step])

报错如下:

  File "/home/netted/img_process_ml/nlp/net/embed.py", line 34, in text_embed
    text_embeddings = embedding_client.create_embedding(text_list[index:index + step])
  File "/home/netted/anaconda3/envs/nlp/lib/python3.10/site-packages/xinference/client/restful/restful_client.py", line 122, in create_embedding
    raise RuntimeError(
RuntimeError: Failed to create the embeddings, detail: Remote server 192.0.0.181:40919 closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/netted/img_process_ml/nlp/net/embed.py", line 68, in <module>
    text_embed(text_units, embedding_client)
  File "/home/netted/img_process_ml/nlp/net/embed.py", line 38, in text_embed
    text_embeddings = embedding_client.create_embedding(text_list[index:index + step])
  File "/home/netted/anaconda3/envs/nlp/lib/python3.10/site-packages/xinference/client/restful/restful_client.py", line 122, in create_embedding
    raise RuntimeError(
RuntimeError: Failed to create the embeddings, detail: Remote server 192.0.0.181:44667 closed
 74%|███████▍  | 3271500/4425878 [00:12<00:04, 263091.43it/s] 

Process finished with exit code 1

Expected behavior / 期待表现

解决这个问题

qinxuye commented 4 days ago

这个问题一般是 OOM 导致的。

minglong-huang commented 4 days ago

这个问题一般是 OOM 导致的。

好奇怪哇 它设置10w能跑一段时间 1w也能跑一段时间 1千 也能跑几个小时 OOM的话 不应该输入进去就报错了嘛

qinxuye commented 4 days ago

内部也是分 batch 的,你每次调用的量可以少一点。

minglong-huang commented 4 days ago

好咧 我试一下参数