Closed lapp0 closed 2 weeks ago
When you use the flash_attention_2
, the model cannot output the partial attention weights. I think there were warnings!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.44.2Who can help?
@ArthurZucker @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
output_attentions=True
should be result in an error forGPT2FlashAttention2
Additionally I'd like to understand what's being returned here.