The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

asmith26 commented 2 months ago

System Info

transformers version: 4.44.2
Platform: Linux-6.8.0-44-generic-x86_64-with-glibc2.39
Python version: 3.12.3
Huggingface_hub version: 0.24.7
Safetensors version: 0.4.5
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.4.1+cu121 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No

Who can help?

speech models: @ylacombe, @eustlb pipelines: @Rocketknight1

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

import torch 
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-base.en",
    device="cpu",
    torch_dtype=torch.float32,
)

# https://github.com/openai/whisper/blob/main/tests/jfk.flac
pipe("./jfk.flac")

Expected behavior

This does return the expected:

{'text': ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.'}

But it also prints the following, so would be nice to fix/suppress:

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

Thanks!

asmith26 commented 2 months ago

Rocketknight1 commented 1 month ago

@asmith26 thanks for the issue! I've reproduced it here, will open a PR to fix in a sec.

ritwikmishra commented 2 weeks ago

I observed this when I was finetuning a LLM with ppo trainer. To resolve this warning I passed the attention mask as a named parameter to the generate function following this.

outputs = model.generate(
  inputs['input_ids'], 
  attention_mask=attention_mask,
  pad_token_id=tokenizer.eos_token_id
)

But then I observed an error which stated, "IndexError: too many indices for tensor of dimension 1" on the line of

lib/python3.9/site-packages/transformers/models/gemma/modeling_gemma.py
position_ids_expanded = position_ids[:, None, :].float() # let us call this line_e

I turned off the attention mask and using print statements before that line_e I inspected what is the ideal behavior of this line_e. The original warning was coming but i ignored it. I saw that position ids are being fed one by one. So to resolve this error I just unsqueezed the attention mask.

outputs = model.generate(
  inputs['input_ids'], 
  attention_mask=attention_mask.unsqueeze(0),
  pad_token_id=tokenizer.eos_token_id
)

and it worked fine.

asmith26 commented 1 week ago

Thanks for your help with this @Rocketknight1. Just thought I'd mention I still seem to be getting the same warning (I'm currently running transformers == 4.47.0.dev0).

Thanks again!

Rocketknight1 commented 1 week ago

@asmith26 I'm not getting that warning when I run the code sample above anymore. Did you change anything about it?

asmith26 commented 1 week ago

Interesting, thanks for the info @Rocketknight1

I've determined that if I add a chunk_length_s=30 (i.e. outputs = pipe("./jfk.flac", chunk_length_s=30) following this tutorial), I get The attention mask is not set and....

Happy to remove this argument for my need. Thanks again! :)

Rocketknight1 commented 5 days ago

That's still potentially an issue we should address, though! Even though you've found a fix, I'll reopen to make sure we don't lose track

huggingface / transformers