Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
System Info / 系統信息
cuda12.2、ubuntu20.04、拉取xinference_v0.15.0镜像加载
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
0.15.0
The command used to start Xinference / 用以启动 xinference 的命令
docker run \ -v </your/home/path>/.xinference:/root/.xinference \ -v </your/home/path>/.cache/huggingface:/root/.cache/huggingface \ -v </your/home/path>/.cache/modelscope:/root/.cache/modelscope \ -p 9997:9997 \ --gpus all \ xprobe/xinference:v0.15.0 \ xinference-local -H 0.0.0.0
Reproduction / 复现过程
可以启动成功,但是调用时出现amphere出现错误 python3.10:/project/lib/Analysis/Allocatioon.cpp:47:std::pair<llvm::SmallVector,llvm::SmallVetor>mlir::triton::getCvtOrder(mlir::Attribute,mlir::Attribute):Assertion ,llvm::SmallVetor>mlir::triton::getCvtOrder(mlir::Attribute,mlir::Attribute):Assertion `!(srcMmalayout && dstMmalayout && !srcMmalayout .isAmpere()) && "mma layout conversion is only supported in Ampere"' faild.
2024-09-17 19:21:12.133 xinference.api.restful_api 1 ERROR chat completion stream get an error: Romote server 0.0.0.0:40417 closed.
!(srcMmalayout && dstMmalayout && !srcMmalayout .isAmpere()) && "mma layout conversion is only supported in Ampere"' faild. python3.10:/project/lib/Analysis/Allocatioon.cpp:47:std::pair<llvm::SmallVector<unsigned int>,llvm::SmallVetor<unsigned int>>mlir::triton::getCvtOrder(mlir::Attribute,mlir::Attribute):Assertion
!(srcMmalayout && dstMmalayout && !srcMmalayout .isAmpere()) && "mma layout conversion is only supported in Ampere"' faild. python3.10:/project/lib/Analysis/Allocatioon.cpp:47:std::pair<llvm::SmallVectorExpected behavior / 期待表现
支持v100非Amphere架构的gpu运行