Closed catalwaysright closed 1 year ago
Whisper hallucinations are a real problem.
Have you tried with --accurate option? Option --vad might also help.
The repetitions also depends a lot on which model is used. I don't know which one you use, but for instance trying to combine results of large-v1 and large-v2 (and maybe medium) could be a way to get more accurate transcriptions.
I'm running into this as well - it's like Whisper is trying to impersonate Matthew McConaughey ("alright alright alright"....) ;)
Thanks for the suggestions; I'll try those as well.
If your audio is noisy, you might try using logmmse to clean it up. I'm trying this (although it caused a return of error #64) - I'll let you know how it works.
Thanks for this repo! I found the timestamp for each word is very accurate. However, I encountered some weird repetition in the transcript just like the original Whisper. I used stable_whisper and this can solve all those repetition in transcript and gives a very stable output. I am wondering if there are some arguments I have to change in transcribe function or is there any way to combine stable_whisper with it to remove those repetitions? Here is the wav demo. friends_01.wav.zip