Closed RohitMidha23 closed 6 months ago
Hey @RohitMidha23 - could you share a reproducible code snippet for your code? Without it, it's hard to say what the error is here, but it's likely fixed by ensuring output_attentions=False
. This is because outputting the attentions is not compatible with Flash Attention 2.
Note for this, you will also need to disable word-level timestamps, since these implicitly set output_attentions=True
in the generation code: https://github.com/huggingface/transformers/blob/536ea2aca234fb48c5c69769431d643b0d93b233/src/transformers/models/whisper/generation_whisper.py#L1013-L1016
You're right, it was due to word level timestamps. Thanks @sanchit-gandhi
I need word level timestamps. How can I fix this? I know that it's possible because the insanely fast whisper replit hosting has it and it works there.
i have the exact same problem and i need the word level timestamps. how shoul i fix it?
where should i set output_attentions=False
?
System Info
transformers
version: 4.38.2Who can help?
@sanchit-gandhi
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
translate
task.Expected behavior
FlashAttention2 should work with finetuned models.