Open qo4on opened 1 month ago
It seems like the doing of the default regrouping. Try disableling it with model.transcribe(..., regroup=False)
or use a custom one that handles numbers better.
If this does not resolve the issue, can you share an audio clip that can replicate this?
regroup=False
This makes the subtitles huge, about 500 characters long each. In text files, it did help get rid of a lot of incorrectly added newlines, but not all of them.
Unfortunately, I can't publish this particular audio file. But as far as I noticed, the quality of transcribing stable-ts
is much worse than openai
just for audio in Russian, which is noticeable on almost any Russian audio. You can easily make sure of this even without knowing Russian, you can just ask to compare the transcription quality of the results obtained by stable-ts
and official openai
with some LLM, for example ChatGPT or Gemini. They provide a very valid comparison.
Try model.transcribe_minimal(..., regroup=False)
.
This will run the original transcription function of the official Whisper and keep the output text the same but make minor adjustments to the timestamps.
Thank you.
Were you able to figure this out?
Why is the quality of
stable-ts
transcription much worse than that ofopenai/whisper
? New lines of text are added where they should not be, numbers like0.003
and0.05
are defined as0 0 3
and0 0 5
...