xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.97k stars 395 forks source link

使用8*v100s启动qwen72B-gptq-int报Amphere错误 #2326

Open masktone opened 1 week ago

masktone commented 1 week ago

System Info / 系統信息

cuda12.2、ubuntu20.04、拉取xinference_v0.15.0镜像加载

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

0.15.0

The command used to start Xinference / 用以启动 xinference 的命令

docker run \ -v </your/home/path>/.xinference:/root/.xinference \ -v </your/home/path>/.cache/huggingface:/root/.cache/huggingface \ -v </your/home/path>/.cache/modelscope:/root/.cache/modelscope \ -p 9997:9997 \ --gpus all \ xprobe/xinference:v0.15.0 \ xinference-local -H 0.0.0.0

Reproduction / 复现过程

可以启动成功,但是调用时出现amphere出现错误 python3.10:/project/lib/Analysis/Allocatioon.cpp:47:std::pair<llvm::SmallVector,llvm::SmallVetor>mlir::triton::getCvtOrder(mlir::Attribute,mlir::Attribute):Assertion !(srcMmalayout && dstMmalayout && !srcMmalayout .isAmpere()) && "mma layout conversion is only supported in Ampere"' faild. python3.10:/project/lib/Analysis/Allocatioon.cpp:47:std::pair<llvm::SmallVector<unsigned int>,llvm::SmallVetor<unsigned int>>mlir::triton::getCvtOrder(mlir::Attribute,mlir::Attribute):Assertion!(srcMmalayout && dstMmalayout && !srcMmalayout .isAmpere()) && "mma layout conversion is only supported in Ampere"' faild. python3.10:/project/lib/Analysis/Allocatioon.cpp:47:std::pair<llvm::SmallVector,llvm::SmallVetor>mlir::triton::getCvtOrder(mlir::Attribute,mlir::Attribute):Assertion `!(srcMmalayout && dstMmalayout && !srcMmalayout .isAmpere()) && "mma layout conversion is only supported in Ampere"' faild. 2024-09-17 19:21:12.133 xinference.api.restful_api 1 ERROR chat completion stream get an error: Romote server 0.0.0.0:40417 closed.

Expected behavior / 期待表现

支持v100非Amphere架构的gpu运行

github-actions[bot] commented 5 days ago

This issue is stale because it has been open for 7 days with no activity.