huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.22k stars 26.84k forks source link

XLMRoberta with Flash Attention 2 #27957

Open IvanPy96 opened 10 months ago

IvanPy96 commented 10 months ago

System Info

Who can help?

@ArthurZucker @younesbelkada

Information

Tasks

Reproduction

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("my_model/", attn_implementation="flash_attention_2")

Expected behavior

Ability to use flash attention 2 for inference. Is it possible to add support of flash attention 2 for XLMRoberta model?

ArthurZucker commented 10 months ago

Thanks for opening, will mark as a good second issue 🤗

mohammedElfatihSalah commented 10 months ago

Hi @IvanPy96 & @ArthurZucker I want to work on this issue. Could you please assign it to me?

ArthurZucker commented 10 months ago

Hey, we don't assign issue, feel free to open a PR and link it to this issue 😉

aikangjun commented 2 months ago

Hi, it seems that this issue has not been resolved ,XLMRoberta still cannot use FlashAttention 2. image

ArthurZucker commented 1 month ago

Hey! Yes as both PR were closed: see the last comment

@aikangjun This PR wasn't merged - it closed because of inactivity it seems. We've recently merged in other PRs to add SDPA to roberta based models though https://github.com/huggingface/transformers/pull/30510 which adds it to this model. This isn't part of 4.42 but will be part of the next release