linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

Weird repetition on transcript #63

Closed catalwaysright closed 1 year ago

catalwaysright commented 1 year ago

Thanks for this repo! I found the timestamp for each word is very accurate. However, I encountered some weird repetition in the transcript just like the original Whisper. I used stable_whisper and this can solve all those repetition in transcript and gives a very stable output. I am wondering if there are some arguments I have to change in transcribe function or is there any way to combine stable_whisper with it to remove those repetitions? Here is the wav demo. friends_01.wav.zip

Jeronymous commented 1 year ago

Whisper hallucinations are a real problem.

Have you tried with --accurate option? Option --vad might also help.

The repetitions also depends a lot on which model is used. I don't know which one you use, but for instance trying to combine results of large-v1 and large-v2 (and maybe medium) could be a way to get more accurate transcriptions.

jeremymatt commented 1 year ago

I'm running into this as well - it's like Whisper is trying to impersonate Matthew McConaughey ("alright alright alright"....) ;)

Thanks for the suggestions; I'll try those as well.

If your audio is noisy, you might try using logmmse to clean it up. I'm trying this (although it caused a return of error #64) - I'll let you know how it works.