intel / openvino-plugins-ai-audacity

A set of AI-enabled effects, generators, and analyzers for Audacity®.
GNU General Public License v3.0
908 stars 57 forks source link

Some strange behavior still with large-v3 #206

Open The3IC opened 4 months ago

The3IC commented 4 months ago

Hi,

Doing some more testing and finding some smaller issues (using large-v3). Is the the good place to report these or should I do it over on Whisper.cpp?

Timestamp problems (resulting also in issues in Audacity label view) such as multiple sentences/labels starting with the same time (of which just one is correct) or having incorrect end times (resulting in minor issues in the Audacity UI such as labels not connected to the correct start loction of the sentence/label)

errors-srt

errors-label

Also seeing more or less this same thing were a sentence/label can be duplicated 2-3 times with the same start time but possibly slightly different end times and/or sentence.

Strange artefacts at the end of the transcript, like strange credits and the "Thank you" when there is nothing spoken at that point.

error-noexistent

RyanMetcalfeInt8 commented 4 months ago

Hi @The3IC,

Do you have an audio sample that reliably produces this kind of artifact? I can try it on my end too.

Generally, my flow would be:

  1. Reproduce it inside of Audacity using these plugins.
  2. Try to reproduce it using whisper.cpp directly (main.exe sample application, v1.6.0).

If I am able to reproduce it using 2) above, then I would deem it a whisper.cpp issue. If not, then I would jump into some investigation comparing the way my plugins are setting things vs. whisper.cpp.

Thanks! Ryan

The3IC commented 4 months ago

@RyanMetcalfeInt8, do you have the sample I sent you still? Now with large_v3 I get a duplication (triplicaton) of labels reproducible + last greeting repeated 3 times at the end (with Intel CPU on my laptop) but different results on my desktop (AMD), however errors seem to be reproducible on each platform. I'll add a link below to the sample audio from I generated the above errors (AMD), you should be able to find the place. This time 60 chars for segment size.

ps: It's interesting that Intel and AMD cpu's seem to generate slightly different results in general (like different capitalization, small diffs in names etc..)

RyanMetcalfeInt8 commented 4 months ago

Thanks for the audio file, I'll give it a try!

The3IC commented 4 months ago

Hi @RyanMetcalfeInt8 , were you able to reproduce or should I report this somewhere else?

RyanMetcalfeInt8 commented 4 months ago

Hi @The3IC,

Sorry for the delay, I haven't had time to try it yet. I can try it within the next few days. Thanks!

The3IC commented 4 months ago

Stumbled on this on whisper, seems they are aware of this "hallucination" error type: https://github.com/ggerganov/whisper.cpp/issues/2191

RyanMetcalfeInt8 commented 4 months ago

Thanks -- hmm, yes seems similar although that issue mentions that it seems to be specific to CUDA configurations, which we don't use. Anyway, thanks -- I'll try this.