huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.36k stars 27.09k forks source link

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. #33498

Open asmith26 opened 2 months ago

asmith26 commented 2 months ago

System Info

Who can help?

speech models: @ylacombe, @eustlb pipelines: @Rocketknight1

Information

Tasks

Reproduction

import torch 
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-base.en",
    device="cpu",
    torch_dtype=torch.float32,
)

# https://github.com/openai/whisper/blob/main/tests/jfk.flac
pipe("./jfk.flac")

Expected behavior

This does return the expected:

{'text': ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.'}

But it also prints the following, so would be nice to fix/suppress:

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

Thanks!

asmith26 commented 2 months ago

Related: https://github.com/openai/whisper/discussions/2335

Rocketknight1 commented 2 months ago

@asmith26 thanks for the issue! I've reproduced it here, will open a PR to fix in a sec.

ritwikmishra commented 3 weeks ago

I observed this when I was finetuning a LLM with ppo trainer. To resolve this warning I passed the attention mask as a named parameter to the generate function following this.

outputs = model.generate(
  inputs['input_ids'], 
  attention_mask=attention_mask,
  pad_token_id=tokenizer.eos_token_id
)

But then I observed an error which stated, "IndexError: too many indices for tensor of dimension 1" on the line of

lib/python3.9/site-packages/transformers/models/gemma/modeling_gemma.py
position_ids_expanded = position_ids[:, None, :].float() # let us call this line_e

I turned off the attention mask and using print statements before that line_e I inspected what is the ideal behavior of this line_e. The original warning was coming but i ignored it. I saw that position ids are being fed one by one. So to resolve this error I just unsqueezed the attention mask.

outputs = model.generate(
  inputs['input_ids'], 
  attention_mask=attention_mask.unsqueeze(0),
  pad_token_id=tokenizer.eos_token_id
)

and it worked fine.

asmith26 commented 2 weeks ago

Thanks for your help with this @Rocketknight1. Just thought I'd mention I still seem to be getting the same warning (I'm currently running transformers == 4.47.0.dev0).

Thanks again!

Rocketknight1 commented 2 weeks ago

@asmith26 I'm not getting that warning when I run the code sample above anymore. Did you change anything about it?

asmith26 commented 2 weeks ago

Interesting, thanks for the info @Rocketknight1

I've determined that if I add a chunk_length_s=30 (i.e. outputs = pipe("./jfk.flac", chunk_length_s=30) following this tutorial), I get The attention mask is not set and....

Happy to remove this argument for my need. Thanks again! :)

Rocketknight1 commented 2 weeks ago

That's still potentially an issue we should address, though! Even though you've found a fix, I'll reopen to make sure we don't lose track