thewh1teagle / vibe

Transcribe on your own!
https://thewh1teagle.github.io/vibe/
MIT License
1.21k stars 71 forks source link

Bug: Transscribing Media ends with exlamation marks #365

Open csPinKie opened 1 week ago

csPinKie commented 1 week ago

What happened?

The transcript of a 1h multi speaker file generates the following output: 00:00 --> 01:20 Speaker 1: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 01:20 --> 01:28 Speaker 1: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 01:28 --> 01:39 Speaker 1: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 01:40 --> 01:41 Speaker 1: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 01:43 --> 01:44 Speaker 1: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 01:44 --> 01:54 Speaker 1: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 01:54 --> 01:57

Steps to reproduce

  1. step one, load a file larger than 1h into the app
  2. step two, set speaker amount to 8, language german
  3. start transcription I use a Amd 7700XT, maybe thats the reason

What OS are you seeing the problem on?

Window

Relevant log output

App Version: vibe 2.6.3
Commit Hash: d24ffccb0d05ea822ff1a3a6edb3b9871be9f368
Arch: x86_64
Platform: windows
Kernel Version: 10.0.19045
OS: windows
OS Version: 10.0.19045
Cuda Version: n/a
Models: ggml-medium.bin
Default Model: "C:\\Users\\Me\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"
Cargo features: vulkan

{
    "avx": {
        "enabled": true,
        "support": true
    },
    "avx2": {
        "enabled": true,
        "support": true
    },
    "f16c": {
        "enabled": true,
        "support": true
    },
    "fma": {
        "enabled": true,
        "support": true
    }
}
thewh1teagle commented 5 days ago

Please show me example youtube video that it happens with or upload audio and show me what language to choose so I can reproduce it

csPinKie commented 4 days ago

Hi, the language doesnt really matter, whether i chose "auto detect language", "german" or "english", its all excamation marks.

Regarding the audio and video: also doesnt matter in my case, different files / formats all resulted in the same problem. I even changed from AMD Pro drivers to Gaming drivers, nothing changed. I am sure you will be able to transcribe anything fine, just like I am on the CPU model ( except that its really slow) Anything else I can provide to help?

thewh1teagle commented 4 days ago

Maybe related to https://github.com/ggerganov/whisper.cpp/issues/2400

dusanpol commented 2 days ago

I have the same issue for transcribing audio clips longer than ~8 seconds. Vulkan build, 7900XTX, Windows 10.