`ImportError: This modeling file requires... flash_attn`

replit / ReplitLM

Inference code and configs for the ReplitLM model family

https://huggingface.co/replit

Apache License 2.0

918 stars 75 forks source link

`ImportError: This modeling file requires... flash_attn` #8

Closed llimllib closed 1 year ago

llimllib commented 1 year ago

Trying to follow the instructions on an m1 mac, I get the above error.

Unfortunately, attempting to install flash_attn does not succeed, due to: RuntimeError: flash_attn was requested, but nvcc was not found., which may be just an unfortunate aspect of not having an nvidia card.

Anyway, the point is probably you should add flash_attn to your list of required modules?

pirroh commented 1 year ago

From the error message, it looks like the CUDA drivers are not installed. Can you test if you can run successfully the commands nvidia-smi and nvcc --version?

Also, we already list flash_attn among the suggested dependencies -- check the README! We haven't tested the model on M1/M2 Macs yet, so in case of further blockers, I can recommend to run with the default attention implementation in PyTorch.

llimllib commented 1 year ago

First of all, you need to install the latest versions of the following dependencies:

einops
sentencepiece
torch
transformers

is the section I read? If flash_attn is listed, I don't see it

llimllib commented 1 year ago

(an m1 mac has no nvidia card so I don't think I can install nvcc? Too bad, but I get that some stuff can't run without an nvidia card)

llimllib commented 1 year ago

now I see that you listed it in the model description, but it appears to be necessary for inference as well, so it should be included in that list of required python packages is what I mean

pirroh commented 1 year ago

You don't need flash attention for inference -- it's a "nice to have" that makes inference faster, but to my knowledge it works only on NVIDIA GPUs (as you need CUDA). In your case, you should load the model as indicated in the first half of that section:

from transformers import AutoModelForCausalLM

# load model
model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)

Hope this helps. Also, make sure to run on the latest version of the Transformers library!

llimllib commented 1 year ago

That's exactly what I did that caused the error to occur!

pirroh commented 1 year ago

Can you run pip install --upgrade transformers, and try again?

llimllib commented 1 year ago

I will do so tomorrow (I have to re-download the model now), but I was working in a clean virtualenv

llimllib commented 1 year ago

(which I assume means pip will download the newest version of a lib? But maybe that assumption is false if there's a previously cached version?)

llimllib commented 1 year ago

I'm unable to reproduce. Sincere apologies for the noise and wasting your time, and thanks for the model

pirroh commented 1 year ago

No problem! Glad it worked in the end :)

omaratef3221 commented 7 months ago

I have mac m2 max 32 GB. pip install --upgrade transformers has worked perfectly for me thanks @pirroh

kabelklaus commented 2 months ago

for me it doesn't work with pip install --upgrade transformers