huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.34k stars 26.36k forks source link

[whisper] transcription is different from hf & openai #32900

Open jsoto-gladia opened 1 month ago

jsoto-gladia commented 1 month ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

silence-middle.wav.zip the following wav file produces

Expected behavior

I would expect the result to be Split infinity, and a time when less is more. Where too much is never

jsoto-gladia commented 1 month ago

you are missing the re encoding mechanism happening when eos is reached within a 30s segment

amyeroberts commented 1 month ago

cc @ylacombe @sanchit-gandhi

ylacombe commented 2 weeks ago

Hey @jsoto-gladia, many thanks for opening this issue!

This looks like an interesting finding, do you think you could provide code snippets (both in transformers and in the whisper repo) to allow us to reproduce it ?