Open IvanPy96 opened 10 months ago
Thanks for opening, will mark as a good second issue 🤗
Hi @IvanPy96 & @ArthurZucker I want to work on this issue. Could you please assign it to me?
Hey, we don't assign issue, feel free to open a PR and link it to this issue 😉
Hi, it seems that this issue has not been resolved ,XLMRoberta still cannot use FlashAttention 2.
Hey! Yes as both PR were closed: see the last comment
@aikangjun This PR wasn't merged - it closed because of inactivity it seems. We've recently merged in other PRs to add SDPA to roberta based models though https://github.com/huggingface/transformers/pull/30510 which adds it to this model. This isn't part of 4.42 but will be part of the next release
System Info
Who can help?
@ArthurZucker @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("my_model/", attn_implementation="flash_attention_2")
Expected behavior
Ability to use flash attention 2 for inference. Is it possible to add support of flash attention 2 for XLMRoberta model?