huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.33k stars 238 forks source link

Long-Form transcription with Faster Whisper #33

Open 9throok opened 7 months ago

9throok commented 7 months ago

Hi, I have been working on faster whisper and trying to use the distil-whisper model. However, distil-whisper supports 30s of audio chunks and using it with faster whisper only outputs the first 30 seconds.

How can it be used with the faster-whisper implementation?

sanchit-gandhi commented 7 months ago

Hey @9throok - cool to see that you're using Distil-Whisper in combination with Faster-Whisper! I believe the .transcribe method in Faster-Whisper handles the long-form generation algorithm: https://github.com/guillaumekln/faster-whisper#usage Is this the API that you've been using? If you could share a reproducible code snippet that showcases the behaviour you're seeing that would be great, thanks!

murdadesmaeeli commented 6 months ago

@9throok, any update on the issue that you mentioned?

Purfview commented 5 months ago

Hi, I have been working on faster whisper and trying to use the distil-whisper model. However, distil-whisper supports 30s of audio chunks and using it with faster whisper only outputs the first 30 seconds.

I had same issue, after the first chunk nada in output, then looked at debug - distill model just hallucinated non stop after the first chunk, solution is to disable context prompt, initial prompt has negative effect too.

How can it be used with the faster-whisper implementation?

Now it has official support -> https://github.com/SYSTRAN/faster-whisper/commit/ad3c83045bc0748b744e064ddfda680c86662e7e

Or you can use the standalone executable -> https://github.com/Purfview/whisper-standalone-win