Closed zhaolj closed 2 months ago
WIN 11 22631.4037 Docker Desktop 4.33.1 (161083) WSL2 CUDA Version: 12.6 遇到的问题完全一致
+1
api_server.py: error: unrecognized arguments: bash -c xinference -H 0.0.0.0
WARNING 09-02 09:27:38 cuda.py:22] You are using a deprecated pynvml
package. Please install nvidia-ml-py
instead. See https://pypi.org/project/pynvml for more information.
+1
WIN 11 22631.4037 Docker Desktop 4.33.1 (161083) WSL2 CUDA Version: 12.6 遇到的问题完全一致
系统环境: WIN 10-22H2(19045.4780) Docker Desktop 4.33.1 (161083) / WSL2 Driver Version: 551.61 / cuda_12.4.r12.4/compiler.33961263_0
启动命令1(当前是v0.14.3版): G:>docker run -e XINFERENCE_MODEL_SRC=modelscope -p 9998:9997 --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0 --log-level debug 错误信息:docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container proce ss: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: WSL environment detected but no adapters were found: unknown.
启动命令2(当前是v0.14.3版,去掉gpus参数): G:>docker run -e XINFERENCE_MODEL_SRC=modelscope -p 9998:9997 xprobe/xinference:latest xinference-local -H 0.0.0.0 --log-level debug 错误信息:RuntimeError: Failed to load shared library '/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libllama.so': libcuda.so.1: cannot open shared object file: No such file or directory
问题:@Minamiyama @Mrluzhe 我这个是 Docker-Desktop 没装好么?需要怎么操作?
@zhaolj @AAEE86 @Minamiyama @Mrluzhe 尝试重新拉取下xprobe/xinference:v0.14.4(Dockerhub版本),我刚push了个修复版本,如果修复了这个问题,请回复一下,谢谢。
@ChengjieLi28 同样的问题,我这里拉最新的镜像可以启动了。但启动的貌似是个api server,没有ui
@ChengjieLi28 同样的问题,我这里拉最新的镜像可以启动了。但启动的貌似是个api server,没有ui
docker完整命令是什么?
@ChengjieLi28 同样的问题,我这里拉最新的镜像可以启动了。但启动的貌似是个api server,没有ui
docker完整命令是什么?
services:
xinference:
image: xprobe/xinference:v0.14.4
ports:
- "9997:9997"
volumes:
- /data/llm/models/.xinference:/root/.xinference
- /data/llm/models/.cache/huggingface:/root/.cache/huggingface
- /data/llm/models/.cache/modelscope:/root/.cache/modelscope
command: 'xinference-local --host 0.0.0.0 --port 9997 --log-level debug'
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
driver: nvidia
count: all
environment:
- XINFERENCE_MODEL_SRC=modelscope
- HF_ENDPOINT=https://hf-mirror.com
@ChengjieLi28 同样的问题,我这里拉最新的镜像可以启动了。但启动的貌似是个api server,没有ui
docker完整命令是什么?
services: xinference: image: xprobe/xinference:v0.14.4 ports: - "9997:9997" volumes: - /data/llm/models/.xinference:/root/.xinference - /data/llm/models/.cache/huggingface:/root/.cache/huggingface - /data/llm/models/.cache/modelscope:/root/.cache/modelscope command: 'xinference-local --host 0.0.0.0 --port 9997 --log-level debug' deploy: resources: reservations: devices: - capabilities: [gpu] driver: nvidia count: all environment: - XINFERENCE_MODEL_SRC=modelscope - HF_ENDPOINT=https://hf-mirror.com
访问宿主机的9997端口啊,这样能拉起来就是对的。
@ChengjieLi28 同样的问题,我这里拉最新的镜像可以启动了。但启动的貌似是个api server,没有ui
docker完整命令是什么?
services: xinference: image: xprobe/xinference:v0.14.4 ports: - "9997:9997" volumes: - /data/llm/models/.xinference:/root/.xinference - /data/llm/models/.cache/huggingface:/root/.cache/huggingface - /data/llm/models/.cache/modelscope:/root/.cache/modelscope command: 'xinference-local --host 0.0.0.0 --port 9997 --log-level debug' deploy: resources: reservations: devices: - capabilities: [gpu] driver: nvidia count: all environment: - XINFERENCE_MODEL_SRC=modelscope - HF_ENDPOINT=https://hf-mirror.com
访问宿主机的9997端口啊
访问宿主机的9997,会自动跳转到 http://ip:9997/ui,然后是
{"detail":"Not Found"}
回退到0.14.3好了 再借楼反馈另外一个问题,加载llm、embedding模型都可以利用到gpu,但是rerank模型试了几个,都无法加载到gpu上,全是用cpu跑的
@ChengjieLi28 同样的问题,我这里拉最新的镜像可以启动了。但启动的貌似是个api server,没有ui
docker完整命令是什么?
services: xinference: image: xprobe/xinference:v0.14.4 ports: - "9997:9997" volumes: - /data/llm/models/.xinference:/root/.xinference - /data/llm/models/.cache/huggingface:/root/.cache/huggingface - /data/llm/models/.cache/modelscope:/root/.cache/modelscope command: 'xinference-local --host 0.0.0.0 --port 9997 --log-level debug' deploy: resources: reservations: devices: - capabilities: [gpu] driver: nvidia count: all environment: - XINFERENCE_MODEL_SRC=modelscope - HF_ENDPOINT=https://hf-mirror.com
访问宿主机的9997端口啊
访问宿主机的9997,会自动跳转到 http://ip:9997/ui,然后是
{"detail":"Not Found"}
0.14.4的镜像前端编译有问题,要等0.14.4.post1了,应该很快会发。
回退到0.14.3好了 再借楼反馈另外一个问题,加载llm、embedding模型都可以利用到gpu,但是rerank模型试了几个,都无法加载到gpu上,全是用cpu跑的
去单开issue,这个与此issue无关,把相关的启动命令细节附上,怎么证明是CPU
感觉几乎很多版本发布都有问题,能否加强下测试, 都是要靠post1 post2版本
System Info / 系統信息
ubuntu 22.04 Docker version 27.2.0, build 3ab4256 CUDA Version: 12.5
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
v0.14.4
The command used to start Xinference / 用以启动 xinference 的命令
docker run -e XINFERENCE_MODEL_SRC=modelscope -p 9998:9997 --gpus all xprobe/xinference:v0.14.4 xinference-local -H 0.0.0.0 --log-level debug
Reproduction / 复现过程
docker run -e XINFERENCE_MODEL_SRC=modelscope -p 9998:9997 --gpus all xprobe/xinference:v0.14.4 xinference-local -H 0.0.0.0 --log-level debug
.Expected behavior / 期待表现
just run normmaly