jhj0517 / Whisper-WebUI

A Web UI for easy subtitle using whisper model.
Apache License 2.0
802 stars 145 forks source link

Subtitle generation is not working properly. #152

Open lgs777 opened 1 month ago

lgs777 commented 1 month ago

Which OS are you using?

windows 11

After a long-awaited update, I attempted to generate Chinese subtitles. As time goes on, I'm encountering an issue where subtitles are generated as numbers only from a certain point.


1286 00:25:55,660 --> 00:25:56,660 90

1287 00:25:56,660 --> 00:25:57,660 90

1288 00:25:57,660 --> 00:25:58,660 90

1289 00:25:58,660 --> 00:25:59,660 90

1290 00:25:59,660 --> 00:26:00,660 90

jhj0517 commented 1 month ago

Hi, it seems like whisper hallucination.

Many of possible solutions are discussed here.

You can try

You can adjust these parameters in the "Advanced Parameters" tab of the WebUI.

Setting condition_on_previous_text to False would make texts less consistent about the context, but it will help to whisper to escape the "loop of failures" that you experienced.

no_speech_threshold and log_probability_threshold are the parameters that define how whisper will be "sensetive" to the small sounds. For example, in your case, this might happen because whisper is too sensitive to small sounds.

Increasing both no_speech_threshold and log_probability_threshold would make whisper insensitive to the small sounds.

*Instead of tweaking these parameters, I'll just add a vad_filter parameter that enables the Silero VAD filter for easy use.

jhj0517 commented 1 month ago

Silero VAD Filter is added in #153.

Open the "Advanced Parameters" tab in the WebUI, and check "Enable Silero VAD Filter". If the hallucination still occurs, uncheck "Condition On Previous Text".

If the hallucination still exists with the above methods, please let me know.

RYG81 commented 1 month ago

Increasing temperature also solves this

windo-developer commented 1 month ago

I have also recently encountered the same hallucination issue in Korean. Even when using the vad_filter and adjusting the Advanced Parameters comprehensively, the same hallucination occurs after a certain point.

In my case, I found that changing the Model to large-v2 prevents hallucinations, although the text generation quality decreases.

Previously, there were no issues even when using large-v3, so I believe there is definitely a problem with whisper.

lgs777 commented 1 month ago

@jhj0517 Your efforts are always appreciated. Thank you for your feedback.

lgs777 commented 1 month ago

Silero VAD Filter is added in #153.

Open the "Advanced Parameters" tab in the WebUI, and check "Enable Silero VAD Filter". If the hallucination still occurs, uncheck "Condition On Previous Text".

If the hallucination still exists with the above methods, please let me know.

@jhj0517

The above method still causes problems. I don't have a problem with V2, but I have a problem with V3. I'm extracting Chinese subtitles.

jhj0517 commented 1 month ago

@lgs777 Thanks for pointing this out, I think this is a pretty notable issue. I'll just update the default model to large-v2 for now.

cookiexND commented 1 month ago

Thank you for all your help. I was having problems with hallucination when exporting Japanese conversations, but changing to large-v2 greatly improved the problem. I still had a little hallucination, but raising Temperrature to 0.2 eliminated it.