Closed joerunfu closed 2 weeks ago
This issue is stale because it has been open for 7 days with no activity.
请问这个问题解决了吗?
这里遇到同样的问题,在A100 (40G) 上启动了两个模型(qwen2.5-7b-instruct 和 qwen2.5-coder-7b-instruct), 调用v1/chat/completion 推理时报错
address=0.0.0.0:17347, pid=88771] CUDA error: device-side assert triggered\\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1\\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Same issue.
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.
System Info / 系統信息
CUDA==12.4 ubuntu 24.0 dify==0.8.2(docker-compose部署) GPU:NVIDIA A10 24G*6
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
0.15.2
The command used to start Xinference / 用以启动 xinference 的命令
docker stop xinference docker rm xinference docker run -d --name xinference -p 9002:9002 \ --restart=always \ --log-driver json-file \ --log-opt max-size=100m \ --log-opt max-file=2 \ --gpus all \ -e XINFERENCE_MODEL_SRC=modelscope \ -e XINFERENCE_HOME=/workspace \ -v /data/xinference:/workspace \ -v /ai/model:/model \ -v /ai/embedding-model:/embedding-model \ -v /ai/rerank-model:/rerank-model \ -v /ai/image-model:/image-model \ -v /ai/audio-model:/audio-model \ -v /etc/localtime:/etc/localtime:ro \ xprobe/xinference:latest \ xinference-local -H 0.0.0.0 -p 9002
Reproduction / 复现过程
启动模型后,点击拉起web ui进行提问,则失败 运行推理时,提示如下:[address=0.0.0.0:38417, pid=1399] CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions. 但是单卡启动则没任何问题,请问是什么呢?Expected behavior / 期待表现
请帮忙修复