jhj0517 / Whisper-WebUI

A Web UI for easy subtitle using whisper model.
Apache License 2.0
1.09k stars 167 forks source link

Stopped recognition #295

Open mrnemosaa opened 3 days ago

mrnemosaa commented 3 days ago

I installed whisper AI in local I tried using the program to recognize the language of a 1 hour and 30-minute video in German, but after about 7 minutes, it only outputs 'Oh.' Even though there were many conversations after that, it consistently outputs the same 'Oh' at intervals of one second. The same issue occurs when using different models. Of course, there are many instances of 'Oh' in the video I'm recognizing, but there is still a fair amount of dialogue, so it's strange that it's not capturing that. Changing the model to a smaller size doesn't make any difference.

image

jhj0517 commented 3 days ago

OK, this is whisper hallucination. Can you upload an sample audio file to reproduce it so I can test with some settings?

mrnemosaa commented 2 days ago

OK, this is whisper hallucination. Can you upload an sample audio file to reproduce it so I can test with some settings?

no I can't. Still it have a lots of convesation, but it has more "Ah" and "Oh" But thank you for your answer. I will find way to make less hallucination

jhj0517 commented 2 days ago

@mrnemosaa Most helpful way would be to use VAD & BGM separation. Most of the hallucination is caused by the noise and background music from your audio.

Just turning on BGM separation filter alone will help alot.