使用8*v100s启动qwen72B-gptq-int报Amphere错误

System Info / 系統信息

cuda12.2、ubuntu20.04、拉取xinference_v0.15.0镜像加载

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[X] docker / docker
[ ] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

0.15.0

The command used to start Xinference / 用以启动 xinference 的命令

docker run \ -v </your/home/path>/.xinference:/root/.xinference \ -v </your/home/path>/.cache/huggingface:/root/.cache/huggingface \ -v </your/home/path>/.cache/modelscope:/root/.cache/modelscope \ -p 9997:9997 \ --gpus all \ xprobe/xinference:v0.15.0 \ xinference-local -H 0.0.0.0

Reproduction / 复现过程

可以启动成功，但是调用时出现amphere出现错误 python3.10：/project/lib/Analysis/Allocatioon.cpp:47:std::pair<llvm::SmallVector,llvm::SmallVetor>mlir::triton::getCvtOrder(mlir::Attribute,mlir::Attribute):Assertion !(srcMmalayout && dstMmalayout && !srcMmalayout .isAmpere()) && "mma layout conversion is only supported in Ampere"' faild. python3.10：/project/lib/Analysis/Allocatioon.cpp:47:std::pair<llvm::SmallVector<unsigned int>,llvm::SmallVetor<unsigned int>>mlir::triton::getCvtOrder(mlir::Attribute,mlir::Attribute):Assertion!(srcMmalayout && dstMmalayout && !srcMmalayout .isAmpere()) && "mma layout conversion is only supported in Ampere"' faild. python3.10：/project/lib/Analysis/Allocatioon.cpp:47:std::pair<llvm::SmallVector,llvm::SmallVetor>mlir::triton::getCvtOrder(mlir::Attribute,mlir::Attribute):Assertion `!(srcMmalayout && dstMmalayout && !srcMmalayout .isAmpere()) && "mma layout conversion is only supported in Ampere"' faild. 2024-09-17 19:21:12.133 xinference.api.restful_api 1 ERROR chat completion stream get an error: Romote server 0.0.0.0:40417 closed.

Expected behavior / 期待表现

支持v100非Amphere架构的gpu运行

xorbitsai / inference