xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.76k stars 373 forks source link

如何在容器中使用xinference-worker进行部署 #1931

Open ZJU-lishuang opened 1 month ago

ZJU-lishuang commented 1 month ago

通过容器启动supervisor

docker run -v ./xinfer_supervisor:/tmp/xinference --name xinfer_supervisor \
-e XINFERENCE_HOME=/tmp/xinference -e XINFERENCE_MODEL_SRC=modelscope \
-p 9997:9997 -p 9996:9996 xprobe/xinference:v0.13.2 \
xinference-supervisor -H 0.0.0.0 -p 9997 --supervisor-port 9996 --log-level debug

在另一台服务器上,在容器中启动worker失败。 指令

docker run -v ./xinfer_worker:/tmp/xinference --name xinfer_worker \
-e XINFERENCE_HOME=/tmp/xinference -e XINFERENCE_MODEL_SRC=modelscope \
--net=host --gpus '"device=6,7"' xprobe/xinference:v0.13.2 \
xinference-worker -e "http://${supervisor_host}:9997" -H 0.0.0.0 --worker-port 9995 --metrics-exporter-port 9994

报错

Traceback (most recent call last):
  File "/usr/local/bin/xinference-worker", line 8, in <module>
    sys.exit(worker())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/cmdline.py", line 354, in worker
    main(
  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/worker.py", line 94, in main
    loop.run_until_complete(task)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/worker.py", line 65, in _start_worker
    await start_worker_components(
  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/worker.py", line 43, in start_worker_components
    await xo.create_actor(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 78, in create_actor
    return await ctx.create_actor(actor_cls, *args, uid=uid, address=address, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 143, in create_actor
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 595, in create_actor
    await self._run_coro(message.message_id, actor.__post_create__())
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 185, in __post_create__
    ] = await xo.actor_ref(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 125, in actor_ref
    return await ctx.actor_ref(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 197, in actor_ref
    future = await self._call(actor_ref.address, message, wait=False)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 77, in _call
    return await self._caller.call(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/core.py", line 181, in call
    client = await self.get_client(router, dest_address)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/core.py", line 68, in get_client
    client = await router.get_client(dest_address, from_who=self)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/router.py", line 143, in get_client
    client = await self._create_client(client_type, address, **kw)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/router.py", line 157, in _create_client
    return await client_type.connect(address, local_address=local_address, **kw)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/communication/socket.py", line 255, in connect
    (reader, writer) = await asyncio.open_connection(host=host, port=port, **kwargs)
  File "/usr/lib/python3.10/asyncio/streams.py", line 48, in open_connection
    transport, _ = await loop.create_connection(
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1076, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1060, in create_connection
    sock = await self._connect_sock(
  File "/usr/lib/python3.10/asyncio/base_events.py", line 969, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 501, in sock_connect
    return await fut
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 541, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [address=0.0.0.0:9995, pid=1890] [Errno 111] Connect call failed ('0.0.0.0', 9996)

指令

docker run -v ./xinfer_worker:/tmp/xinference --name xinfer_worker \
-e XINFERENCE_HOME=/tmp/xinference -e XINFERENCE_MODEL_SRC=modelscope \
--net=host --gpus '"device=6,7"' xprobe/xinference:v0.13.2 \
xinference-worker -e "http://${supervisor_host}:9997" -H "${worker_host}" --worker-port 9995 --metrics-exporter-port 9994

报错

2024-07-25 07:29:40,967 xinference.core.worker 2026 INFO     Starting metrics export server at 0.0.0.0:9994
2024-07-25 07:29:40,968 xinference.core.worker 2026 INFO     Checking metrics export server...
2024-07-25 07:29:42,020 xinference.core.worker 2026 INFO     Metrics server is started at: http://0.0.0.0:9994
Traceback (most recent call last):
  File "/usr/local/bin/xinference-worker", line 8, in <module>
    sys.exit(worker())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/cmdline.py", line 354, in worker
    main(
  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/worker.py", line 94, in main
    loop.run_until_complete(task)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/worker.py", line 65, in _start_worker
    await start_worker_components(
  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/worker.py", line 43, in start_worker_components
    await xo.create_actor(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 78, in create_actor
    return await ctx.create_actor(actor_cls, *args, uid=uid, address=address, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 143, in create_actor
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 595, in create_actor
    await self._run_coro(message.message_id, actor.__post_create__())
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 185, in __post_create__
    ] = await xo.actor_ref(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 125, in actor_ref
    return await ctx.actor_ref(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 197, in actor_ref
    future = await self._call(actor_ref.address, message, wait=False)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 77, in _call
    return await self._caller.call(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/core.py", line 181, in call
    client = await self.get_client(router, dest_address)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/core.py", line 68, in get_client
    client = await router.get_client(dest_address, from_who=self)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/router.py", line 143, in get_client
    client = await self._create_client(client_type, address, **kw)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/router.py", line 157, in _create_client
    return await client_type.connect(address, local_address=local_address, **kw)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/communication/socket.py", line 255, in connect
    (reader, writer) = await asyncio.open_connection(host=host, port=port, **kwargs)
  File "/usr/lib/python3.10/asyncio/streams.py", line 48, in open_connection
    transport, _ = await loop.create_connection(
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1076, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1060, in create_connection
    sock = await self._connect_sock(
  File "/usr/lib/python3.10/asyncio/base_events.py", line 969, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 501, in sock_connect
    return await fut
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 541, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [address="${worker_host}":9995, pid=2026] [Errno 111] Connect call failed ('0.0.0.0', 9996)

请问可以限制worker端口的使用吗,如下命令应该是怎么样的才能把worker跑起来。

docker run -v ./xinfer_worker:/tmp/xinference --name xinfer_worker \
-e XINFERENCE_HOME=/tmp/xinference -e XINFERENCE_MODEL_SRC=modelscope \
-p 9995:9995 -p 9994:9994 --gpus '"device=6,7"' xprobe/xinference:v0.13.2 \
xinference-worker -e "http://${supervisor_host}:9997" -H 0.0.0.0 --worker-port 9995 --metrics-exporter-port 9994
Valdanitooooo commented 1 month ago

你的 --supervisor-port 是 9996 worker 里应该 xinference-worker -e "http://${supervisor_host}:9996" 吧

ZJU-lishuang commented 1 month ago

官方文档写的是web ui的端口

ZJU-lishuang commented 1 month ago

你的 --supervisor-port 是 9996 worker 里应该 xinference-worker -e "http://${supervisor_host}:9996" 吧

试了一下,不行,直接卡住了。9997端口开始还能正常打印

Valdanitooooo commented 1 month ago

worker 里的 -H 0.0.0.0 改成服务器的 ip 试试,我看了我的配置,应该是这个问题

ZJU-lishuang commented 1 month ago

感谢,但我的需求是一台服务器启动supervisor,另一台服务器启动worker。集群里面有多个服务器

Valdanitooooo commented 1 month ago

我也是这样的集群,可以重现出你的报错,改成实际的 ip 就启动成功了

(llm) aigroup@root:/data/llmops/scripts$ xinference-worker -H 0.0.0.0 -e "http://my_supervisor_host:9999"
2024-07-26 02:10:16,502 xinference.core.worker 3039962 INFO     Starting metrics export server at 0.0.0.0:None
2024-07-26 02:10:16,506 xinference.core.worker 3039962 INFO     Checking metrics export server...
2024-07-26 02:10:19,531 xinference.core.worker 3039962 INFO     Metrics server is started at: http://0.0.0.0:33671
Traceback (most recent call last):
....
ConnectionRefusedError: [address=my_supervisor_host:29051, pid=1235954] [Errno 111] Connect call failed ('0.0.0.0', 17883)

(llm) aigroup@root:/data/llmops/scripts$ xinference-worker -H my_worker_host -e "http://my_supervisor_host:9999"
2024-07-26 02:10:49,885 xinference.core.worker 3040160 INFO     Starting metrics export server at 0.0.0.0:None
2024-07-26 02:10:49,886 xinference.core.worker 3040160 INFO     Checking metrics export server...
2024-07-26 02:10:52,889 xinference.core.worker 3040160 INFO     Metrics server is started at: http://0.0.0.0:33139
2024-07-26 02:10:52,905 xinference.core.worker 3040160 INFO     Xinference worker my_worker_host:48747 started
ZJU-lishuang commented 1 month ago

需要不同服务器间的容器组网,能相互访问到。我直接用--net=host了

ZJU-lishuang commented 1 month ago

本来打算用-p 9997:9997 -p 9996:9996的,看来不能直接用了。

ZJU-lishuang commented 1 month ago

我也是这样的集群,可以重现出你的报错,改成实际的 ip 就启动成功了

(llm) aigroup@root:/data/llmops/scripts$ xinference-worker -H 0.0.0.0 -e "http://my_supervisor_host:9999"
2024-07-26 02:10:16,502 xinference.core.worker 3039962 INFO     Starting metrics export server at 0.0.0.0:None
2024-07-26 02:10:16,506 xinference.core.worker 3039962 INFO     Checking metrics export server...
2024-07-26 02:10:19,531 xinference.core.worker 3039962 INFO     Metrics server is started at: http://0.0.0.0:33671
Traceback (most recent call last):
....
ConnectionRefusedError: [address=my_supervisor_host:29051, pid=1235954] [Errno 111] Connect call failed ('0.0.0.0', 17883)

(llm) aigroup@root:/data/llmops/scripts$ xinference-worker -H my_worker_host -e "http://my_supervisor_host:9999"
2024-07-26 02:10:49,885 xinference.core.worker 3040160 INFO     Starting metrics export server at 0.0.0.0:None
2024-07-26 02:10:49,886 xinference.core.worker 3040160 INFO     Checking metrics export server...
2024-07-26 02:10:52,889 xinference.core.worker 3040160 INFO     Metrics server is started at: http://0.0.0.0:33139
2024-07-26 02:10:52,905 xinference.core.worker 3040160 INFO     Xinference worker my_worker_host:48747 started

这是物理机还是容器内呀?容器内的my_worker_host地址如何有效获取

ZJU-lishuang commented 1 month ago

我这儿容器设置--net=host,my_supervisor_host和my_worker_host 设置宿主机的地址才可以

xunuo2345 commented 1 month ago

--metrics-exporter-port 9994,想问下这个指标具体是怎么查看的,如果开启的话?还是默认日志就有~

ZJU-lishuang commented 1 month ago

--metrics-exporter-port 9994,想问下这个指标具体是怎么查看的,如果开启的话?还是默认日志就有~

默认就有,我只是把端口指定了,本来是随机的

da-peng commented 1 month ago

work的这种,可以运行多种不同模型吗? @ZJU-lishuang

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

stivensss commented 1 month ago

我这儿容器设置--net=host,my_supervisor_host和my_worker_host 设置宿主机的地址才可以

能给一下完整的docker命令吗

ZJU-lishuang commented 1 month ago

我这儿容器设置--net=host,my_supervisor_host和my_worker_host 设置宿主机的地址才可以

能给一下完整的docker命令吗

the same as you

stivensss commented 1 month ago

我这儿容器设置--net=host,my_supervisor_host和my_worker_host 设置宿主机的地址才可以 能给一下完整的docker命令吗

the same as you

我更新到最新版本成功了,具体看我这篇文章https://zhuanlan.zhihu.com/p/581246669

ZJU-lishuang commented 4 weeks ago

我这儿容器设置--net=host,my_supervisor_host和my_worker_host 设置宿主机的地址才可以 能给一下完整的docker命令吗

the same as you

我更新到最新版本成功了,具体看我这篇文章https://zhuanlan.zhihu.com/p/581246669

容器还是设置了--net=host