xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.4k stars 438 forks source link

启动Qwen2.5-7B-Instruct模型,能正常多卡启动,但推理失败 #2374

Closed joerunfu closed 2 weeks ago

joerunfu commented 1 month ago

System Info / 系統信息

CUDA==12.4 ubuntu 24.0 dify==0.8.2(docker-compose部署) GPU:NVIDIA A10 24G*6

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

0.15.2

The command used to start Xinference / 用以启动 xinference 的命令

docker stop xinference docker rm xinference docker run -d --name xinference -p 9002:9002 \ --restart=always \ --log-driver json-file \ --log-opt max-size=100m \ --log-opt max-file=2 \ --gpus all \ -e XINFERENCE_MODEL_SRC=modelscope \ -e XINFERENCE_HOME=/workspace \ -v /data/xinference:/workspace \ -v /ai/model:/model \ -v /ai/embedding-model:/embedding-model \ -v /ai/rerank-model:/rerank-model \ -v /ai/image-model:/image-model \ -v /ai/audio-model:/audio-model \ -v /etc/localtime:/etc/localtime:ro \ xprobe/xinference:latest \ xinference-local -H 0.0.0.0 -p 9002

Reproduction / 复现过程

启动模型后,点击拉起web ui进行提问,则失败 运行推理时,提示如下:[address=0.0.0.0:38417, pid=1399] CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. 但是单卡启动则没任何问题,请问是什么呢?

Expected behavior / 期待表现

请帮忙修复

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

Royhuiy commented 1 month ago

请问这个问题解决了吗?

tungsten106 commented 1 month ago

这里遇到同样的问题,在A100 (40G) 上启动了两个模型(qwen2.5-7b-instruct 和 qwen2.5-coder-7b-instruct), 调用v1/chat/completion 推理时报错 address=0.0.0.0:17347, pid=88771] CUDA error: device-side assert triggered\\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1\\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

cyhasuka commented 1 month ago

Same issue.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been inactive for 5 days since being marked as stale.