Mistral with FlashAttention2

khalil-Hennara commented 9 months ago

System Info

transformers version: 4.35.2
Platform: Linux-6.1.58+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.1
Accelerate version: 0.27.0.dev0
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu121 (True)
Tensorflow version (GPU?): 2.15.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.7.5 (cpu)
Jax version: 0.4.23
JaxLib version: 0.4.23
Using GPU in script?: True
Using distributed or parallel set-up in script?: False

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", torch_dtype=torch.float16, attn_implementation="flash_attention_2") The code line has taken from the official website Mistral TypeError: MistralForCausalLM.init() got an unexpected keyword argument 'attn_implementation'

when using use_flash_attention_2=True it's work fine

Expected behavior

The model should be loaded without error, using flash attention2 in the background.

khalil-Hennara commented 9 months ago

I think the problem related to @ArthurZucker and @stevhliu

IYoreI commented 9 months ago

It looks like ‘attn_implementation’ is supported in version 4.36. Maybe you need to try it after upgrading the transfromers library version

ArthurZucker commented 9 months ago

Yes, as @IYoreI mentions, feel free to upgrade the transformers version!

khalil-Hennara commented 9 months ago

Thanks @IYoreI , @ArthurZucker for your time

ArthurZucker commented 9 months ago

Closing as it's resolved!

huggingface / transformers