Open The3IC opened 4 months ago
Hi @The3IC,
Do you have an audio sample that reliably produces this kind of artifact? I can try it on my end too.
Generally, my flow would be:
If I am able to reproduce it using 2) above, then I would deem it a whisper.cpp issue. If not, then I would jump into some investigation comparing the way my plugins are setting things vs. whisper.cpp.
Thanks! Ryan
@RyanMetcalfeInt8, do you have the sample I sent you still? Now with large_v3 I get a duplication (triplicaton) of labels reproducible + last greeting repeated 3 times at the end (with Intel CPU on my laptop) but different results on my desktop (AMD), however errors seem to be reproducible on each platform. I'll add a link below to the sample audio from I generated the above errors (AMD), you should be able to find the place. This time 60 chars for segment size.
ps: It's interesting that Intel and AMD cpu's seem to generate slightly different results in general (like different capitalization, small diffs in names etc..)
Thanks for the audio file, I'll give it a try!
Hi @RyanMetcalfeInt8 , were you able to reproduce or should I report this somewhere else?
Hi @The3IC,
Sorry for the delay, I haven't had time to try it yet. I can try it within the next few days. Thanks!
Stumbled on this on whisper, seems they are aware of this "hallucination" error type: https://github.com/ggerganov/whisper.cpp/issues/2191
Thanks -- hmm, yes seems similar although that issue mentions that it seems to be specific to CUDA configurations, which we don't use. Anyway, thanks -- I'll try this.
Hi,
Doing some more testing and finding some smaller issues (using large-v3). Is the the good place to report these or should I do it over on Whisper.cpp?
Timestamp problems (resulting also in issues in Audacity label view) such as multiple sentences/labels starting with the same time (of which just one is correct) or having incorrect end times (resulting in minor issues in the Audacity UI such as labels not connected to the correct start loction of the sentence/label)
Also seeing more or less this same thing were a sentence/label can be duplicated 2-3 times with the same start time but possibly slightly different end times and/or sentence.
Strange artefacts at the end of the transcript, like strange credits and the "Thank you" when there is nothing spoken at that point.