xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.41k stars 438 forks source link

xprobe/xinference:v0.14.0.post1 运行Alibaba-NLP/gte-Qwen2-7B-instruct报错 no moudle run install flash_attn #2022

Open xujingsen521 opened 3 months ago

xujingsen521 commented 3 months ago

System Info / 系統信息

centos7,docker:26.0.0

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

xprobe/xinference:v0.14.0.post1

The command used to start Xinference / 用以启动 xinference 的命令

docker run -itd --name="xjs-inference" \ -v ./:/app \ -P -p 9995-9999:9995-9999 \ --gpus '"device=0,1"' \ xprobe/xinference:v0.14.0.post1 \ xinference-local -H 0.0.0.0 --port 9997 --log-level debug

Reproduction / 复现过程

xinference register --model-type embedding --file gte-Qwen2-7B-instruct.json --persist xinference launch \ --model-name gte-Qwen2-7B-instruct \ --model-type embedding

Expected behavior / 期待表现

能有一个装有nvcc和flash_attn 2.5.6版本以上的docker容器

jim1997 commented 1 month ago

同求

ConleyKong commented 1 month ago

xinference0.15.2可用的flash-attn安装包链接:https://pan.baidu.com/s/1OTOKLzKcSukjvDqQ-F6XVQ?pwd=1111 提取码:1111