jhj0517 / Whisper-WebUI

A Web UI for easy subtitle using whisper model.
Apache License 2.0
1.45k stars 206 forks source link

Transcription get stuck and start repeating phrases #299

Open martindellavecchia opened 2 months ago

martindellavecchia commented 2 months ago

Which OS are you using?

I am trying to transcribe videos meetings recorded in serveral formats, mostly mkv and mp4. Some of them are transcribed good, some excellent, depending on the quality of the audio in the video, which is normal.

What I do see strange thou is that in some transcriptions the model seems to get stuck and start repeeting the same phrase over and over again. Some times it gets untuck and continue with the transcription normally and other repeat the same phase till the end.

I am seeting this behaviour mostly on a large-v2 running on fast-whisper

Any clue what should I try?. Is there any way to ensure better accuracy?.

Thanks in advance

jhj0517 commented 2 months ago

Hi, this is the Whisper hallucination. I'd be happy if you could upload some sample audio to reproduce so I can test with some settings.

Most of the hallucinations come from the noise such as background music from the audio.

The most effective way to do this would be to use VAD or BGM separation filters in the WebUI, so that the noise is gone before the whisper handles it.

You can try with this VAD setting :

Minimum Speech Duration (ms) : 250
Maximum Speech Duration (s) : 9999
Minimum Silence Duration (ms) : 250
Speech Padding (ms) : 2000

And enables BGM Seperation Filter at the same time.

Since VAD and BGM separation models also cause hallucinations, if you still have problem, you can try with only BGM separation filter model alone or VAD alone .. etc.

martindellavecchia commented 2 months ago

I'll trying activating BGM, as its a meeting recording there's no background noise or other wierd sounc artifacts.

should I experience an accuracy improvment by moving from fast-whisper to whisper?

jhj0517 commented 2 months ago

@martindellavecchia It shouldn't be like that, but I'm not sure because I haven't run a benchmark in terms of WER with openai/whisper vs SYSTRAN/faster-whisper.

martindellavecchia commented 2 months ago

I'll try switching to regular whisper to see if something change, I'll keep you poseted.

mrnemosaa commented 1 month ago

In my Case, Condition on previous text during decoding <- Deactivate and repetition penalty <- over 2 or more Works for me