`gpt2` with `output_attentions=True` has different attentions shape between flash and eager

lapp0 commented 2 months ago

System Info

transformers version: 4.44.2
Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
Python version: 3.10.14
Huggingface_hub version: 0.24.6
Safetensors version: 0.4.5
Accelerate version: 0.34.2
Accelerate config: not found
PyTorch version (GPU?): 2.3.0 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA GeForce RTX 4090

Who can help?

@ArthurZucker @gante

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[x] My own task or dataset (give details below)

Reproduction

>>> model_flash = transformers.AutoModelForCausalLM.from_pretrained("gpt2", device_map="cuda", attn_implementation="flash_attention_2", torch_dtype=torch.bfloat16)
>>> model_eager = transformers.AutoModelForCausalLM.from_pretrained("gpt2", device_map="cuda", attn_implementation="eager")

>>> input_ids = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]).to("cuda")

>>> eager_attns = model_eager(input_ids, output_attentions=True).attentions
>>> flash_attns = model_flash(input_ids, output_attentions=True).attentions

>>> len(eager_attns)
12
>>> len(flash_attns)
12

>>> eager_attns[0].shape
torch.Size([3, 12, 4, 4])
>>> flash_attns[0].shape
torch.Size([3, 4, 768])

Expected behavior

output_attentions=True should be result in an error for GPT2FlashAttention2

Additionally I'd like to understand what's being returned here.

ArthurZucker commented 1 month ago

When you use the flash_attention_2, the model cannot output the partial attention weights. I think there were warnings!

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers