huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.68k stars 26.22k forks source link

RuntimeError: Failed to import transformers.models.mistral.modeling_mistral because of the following error (look up to see its traceback): cannot import name 'is_flash_attn_greater_or_equal_2_10' from 'transformers.utils' (/usr/local/lib/python3.10/dist-packages/transformers/utils/__init__.py) #28200

Closed Jaykumaran closed 8 months ago

Jaykumaran commented 8 months ago

System Info

!pip install trl transformers==4.35.2 accelerate peft==0.6.2 -Uqqq

!pip install trl transformers accelerate peft==0.6.2 -Uqqq !pip install datasets bitsandbytes einops wandb -Uqqq !pip install flash-attn --no-build-isolation -Uqq

Who can help?

No response

Information

Tasks

Reproduction

!pip install trl transformers==4.35.2 accelerate peft==0.6.2 -Uqqq

!pip install trl transformers accelerate peft==0.6.2 -Uqqq !pip install datasets bitsandbytes einops wandb -Uqqq !pip install flash-attn --no-build-isolation -Uqq

MODEL_NAME = "HuggingFaceH4/zephyr-7b-beta"

bnb_config = BitsAndBytesConfig( load_in_4bit=True, # load model in 4-bit precision bnb_4bit_quant_type="nf4", # pre-trained model should be quantized in 4-bit NF format bnb_4bit_use_double_quant=True, # Using double quantization as mentioned in QLoRA paper bnb_4bit_compute_dtype=torch.bfloat16,

During computation, pre-trained model should be loaded in BF16 format

)

model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, quantization_config = bnb_config, device_map = 0, use_cache=True, trust_remote_code=True, use_flash_attention_2 = True )

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

tokenizer.pad_token = tokenizer.eos_token

tokenizer.padding_side = "right"

Expected behavior

when trying to load the model,it results in following error. RuntimeError: Failed to import transformers.models.mistral.modeling_mistral because of the following error (look up to see its traceback): cannot import name 'is_flash_attn_greater_or_equal_2_10' from 'transformers.utils' (/usr/local/lib/python3.10/dist-packages/transformers/utils/init.py)

amyeroberts commented 8 months ago

Hi @Jaykumaran, thanks for raising this issue!

Could you run the following in the command line to check the version of flash-attn being run in your python environment:

python -c "import flash_attn; from transformers.utils.import_utils import is_flash_attn_greater_or_equal_2_10; print(flash_attn.__version__); print(is_flash_attn_greater_or_equal_2_10())"

?

manliu1225 commented 3 months ago

I also have this issue. When I run python -c "import flash_attn; from transformers.utils.import_utils import is_flash_attn_greater_or_equal_2_10; print(flash_attn.__version__); print(is_flash_attn_greater_or_equal_2_10())" The results are:

2.5.8
True
amyeroberts commented 3 months ago

Hi @manliu1225, could you provide:

Srikor commented 2 months ago

Having the same issue. Happens at the second line. I'm running on Google Colab environment.

model_name = "microsoft/Phi-3-vision-128k-instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)

Error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-31-675821a17052>](https://localhost:8080/#) in <cell line: 5>()
      3 
      4 # Load base model
----> 5 model = AutoModelForCausalLM.from_pretrained(model_name)

10 frames
[~/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-vision-128k-instruct/7b92b8c62807f5a98a9fa47cdfd4144f11fbd112/modeling_phi3_v.py](https://localhost:8080/#) in <module>
     37 )
     38 from transformers.modeling_utils import PreTrainedModel
---> 39 from transformers.utils import (
     40     add_code_sample_docstrings,
     41     add_start_docstrings,

ImportError: cannot import name 'is_flash_attn_greater_or_equal_2_10' from 'transformers.utils' (/usr/local/lib/python3.10/dist-packages/transformers/utils/__init__.py)
amyeroberts commented 2 months ago

Hi @Srikor, could you share your running environment (run transformers-cli env in the terminal and copy-paste the output)? I'm unable to replicate this with and without flash attention installed in my environment, when running on the development branch.

Srikor commented 2 months ago

Hello @amyeroberts. I'm running this in Google Colab free version and hence couldn't execute the command you provided in a terminal. I tried a fresh notebook and ran the below minimal code after installing transformers and datasets package from pip and ran into another error related to flash attention.

Code:

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    pipeline,
    logging,
)
model_name = "microsoft/Phi-3-vision-128k-instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)

Error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-3-0986d235c6b3>](https://localhost:8080/#) in <cell line: 2>()
      1 model_name = "microsoft/Phi-3-vision-128k-instruct"
----> 2 model = AutoModelForCausalLM.from_pretrained(model_name)

3 frames
[/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in _check_and_enable_flash_attn_2(cls, config, torch_dtype, device_map, check_device_map, hard_check_only)
   1569 
   1570             if importlib.util.find_spec("flash_attn") is None:
-> 1571                 raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
   1572 
   1573             flash_attention_version = version.parse(importlib.metadata.version("flash_attn"))

ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

---------------------------------------------------------------------------

Just curious that the documentation https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 indicates I need to pass attn_implementation parameter to enable flash attention, but the error indicates it has been enabled.

amyeroberts commented 2 months ago

@Srikor Thanks for your reply

I'm running this in Google Colab free version and hence couldn't execute the command you provided in a terminal.

It should still be possible, even if the colab is free. To run a CLI command in he notebook, you need to run with a ! at the start i.e. ! transformers-cli env

but the error indicates it has been enabled.

It's been enabled because of the model implementation on the hub has a flash attention class implemented. In this case, it will automatically be selected (which is admittedly not ideal, as it can lead to unexpected behaviour).

You can select the attention implementation run when instantiating the model, or its config by setting e.g. attn_implementation="eager".

Srikor commented 2 months ago

Hello @amyeroberts. PFB the requested details.

Srikor commented 2 months ago

Hello @amyeroberts. Really sorry I just noticed that I have been running the model on CPU instead of GPU. Switched to GPU and enabled flash attention and its working fine now.