Closed fablau closed 1 year ago
It seems like this is one of those cases where suppressing the timestamp tokens throws the model off.
Use suppress_ts_tokens=False
to disable it.
results = model.transcribe(..., suppress_ts_tokens=False)
Perfect, problem solved! Thank you!
What are the use cases where you suggest setting that to "true"?
Empirically found it tends to do better with vad=True
. This might be due to the VAD being more accurate, in terms detecting speech, than the default non-VAD method which might be suppressing too many of the "good" timestamps that it ends up picking ones that make the model skip words and hallucinate.
suppress_ts_tokens=True
made more sense as the default in version 1.x but probably shouldn't have carried over to 2.x because it offers little benefit. The timestamps produced by the token suppression are discarded anyway when word_timestamps=True
.
Got it. Thanks!
Hello. I just installed the latest version in the repo, and now I get incorrect transcriptions.
Here is the video I have tried to transcribe:
https://youtu.be/aFJCahnRF9s
It gets perfectly transcribed with the regular Whisper, but with stable-ts I skips some words.
Here is the code I have been using so far, and it always worked before:
I am pasting below the first 13 subs, which are clearly incorrect (many words are missing):
Any ideas?