Flash attention is not installed.

huggingface / text-embeddings-inference

A blazing fast inference solution for text embeddings models

https://huggingface.co/docs/text-embeddings-inference/quick_tour

Apache License 2.0

2.6k stars 162 forks source link

Flash attention is not installed. #317

Closed zhangdanfeng888 closed 2 months ago

zhangdanfeng888 commented 2 months ago

I run this: text-embeddings-router --model-id Salesforce/SFR-Embedding-Mistral --port 8080 --dtype float16

Then I get the following error: I have installed flash-attn:

The same error when I run: text-embeddings-router --model-id Salesforce/SFR-Embedding-2_R --port 8080 --dtype float16 What's wrong?

OlivierDehaene commented 2 months ago

How did you install TEI? Did you install it with cuda support? Do you have a supported GPU (cuda capability > 7.5)?

zhangdanfeng888 commented 2 months ago

How did you install TEI? Did you install it with cuda support? Do you have a supported GPU (cuda capability > 7.5)?

I install TEI with your step CUDA: cargo install --path router -F candle-cuda-turing -F http --no-default-features, and the TEI is successfully installed, I think. My CUDA version is 12.4 and also add the nvidia binaries to path. My GPU is 3080, is it enough?

OlivierDehaene commented 2 months ago

Use cargo install --path router -F candle-cuda -F http --no-default-features instead. Note the candle-cuda instead of candle-cuda-turing.

A 3080 is not a turing GPU but an Ampere one.

OlivierDehaene commented 2 months ago

The candle-cuda-turing feature should only be used for old GPUs like T4s. Also, 1.3 had a bug for mistral. I strongly advise you to update either to latest or 1.4.

zhangdanfeng888 commented 2 months ago

@OlivierDehaene Okay, I will try 1.4 via cargo install --path router -F candle-cuda -F http --no-default-features, thanks a lot