thewh1teagle / vibe

Transcribe on your own!
https://thewh1teagle.github.io/vibe/
MIT License
917 stars 55 forks source link

Bug: Crash to desktop when transcripting MP3 #90

Closed clem0338 closed 4 months ago

clem0338 commented 4 months ago

What happened?

On NVIDIA version,

An MP3 transcription crashes the app to desktop on my workstation (Win11, RTX 3070 With 8Gb, 32Gb RAM) but works flawlessly on my laptop (Win11, GTX1650 with 4Gb, 16Bg RAM)

Here is the EventViewer crash log:

log ``` - System - Provider [ Name] Application Error [ Guid] {a0e9b465-b939-57d7-b27d-95d8e925ff57} EventID 1000 Version 0 Level 2 Task 100 Opcode 0 Keywords 0x8000000000000000 - TimeCreated [ SystemTime] 2024-05-31T08:00:15.3838683Z EventRecordID 3506 Correlation - Execution [ ProcessID] 3204 [ ThreadID] 3352 Channel Application Computer DESKTOP-0D5MR4P - Security [ UserID] S-1-5-21-3684703487-1378115604-541973499-1001 - EventData AppName vibe.exe AppVersion 2.0.0.0 AppTimeStamp 66568d1b ModuleName ucrtbase.dll ModuleVersion 10.0.26217.5000 ModuleTimeStamp 53f1888b ExceptionCode c0000409 FaultingOffset 00000000000a514e ProcessId 0x48f4 ProcessCreationTime 0x1dab33021395435 AppPath F:\Downloads\AI\vibe_2.0.0_x64-setup_nvidia\vibe.exe ModulePath C:\WINDOWS\System32\ucrtbase.dll IntegratorReportId 585e91ac-adfa-4a77-a141-9b761057ed9a PackageFullName PackageRelativeAppId ```

Steps to reproduce

  1. Using default settings (Vibe freshly installed)
  2. Select MP3
  3. Press "Transcribe" button
  4. Crashes in the next 5s with no log file created under '%appdata%\github.com.thewh1teagle.vibe' (in fact this folder is not even created)

What OS are you seeing the problem on?

Window

Relevant log output

log ```shell App Version: 2.0.0 Commit Hash: 4a44b1c79f287a6903c07128b976349839002d01 Arch: x86_64 Platform: windows Kernel Version: 10.0.26217 OS: windows OS Version: 10.0.26217 Models: ggml-medium.bin Default Model: "C:\\Users\\Cedric\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin" { "avx": { "enabled": true, "support": true }, "avx2": { "enabled": true, "support": true }, "f16c": { "enabled": true, "support": true }, "fma": { "enabled": treu, "support": true } } ```

Edit:

  1. My NVidia Drivers are up to date (560.38) on both systems
  2. Works using the "non NVIDIA" version
thewh1teagle commented 4 months ago

Thanks for reporting this issue!

It's challenging to pinpoint the specific cause from the event log error. However, we can try using the original whisper.cpp and compare the results. The issue might be related to a version mismatch with the NVIDIA libraries.

Try the following:

  1. Download whisper-cublas-12.2.0-bin-x64.zip
  2. Extract and open the folder, then open explorer in that folder and hit Ctrl + l in explorer, type cmd and enter
  3. Download vibe/samples/single.wav and place it in the same folder (and check that the file is ok)
  4. Try to transcribe by execute
main.exe -m "%localappdata%\github.com.thewh1teagle.vibe\ggml-medium.bin" -f "samples_single.wav"
  1. Repeat the above with whisper-cublas-11.8.0-bin-x64.zip

In addition to get more insights from Vibe itself, you can run it from cmd.exe with:

set RUST_BACKTRACE=1
set RUST_LOG=vibe=debug,whisper_rs=debug
%localappdata%\vibe\vibe.exe
clem0338 commented 4 months ago

Result from 12.2.0 version (looks like working without crashing)

F:\Downloads\AI\whisper-cublas-12.2.0-bin-x64>main.exe -m "%localappdata%\github.com.thewh1teagle.vibe\ggml-medium.bin" -f ..\samples_single.wav
whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\Cedric\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3070 Ti, compute capability 8.6, VMM: yes
whisper_model_load: model size    = 1533.14 MB3.14 MB
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.68 MB
whisper_init_state: compute buffer (encode) =  594.22 MB
whisper_init_state: compute buffer (cross)  =    7.85 MB
whisper_init_state: compute buffer (decode) =  142.09 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0

main: processing 'F:\Downloads\Electronic\CAN\samples_single.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings:     load time =  6719.37 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    11.65 ms
whisper_print_timings:   sample time =    79.58 ms /   143 runs (    0.56 ms per run)
whisper_print_timings:   encode time =  1916.62 ms /     1 runs ( 1916.62 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time =   594.36 ms /   141 runs (    4.22 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  9356.77 ms

same for 11.8.0

F:\Downloads\AI\whisper-cublas-11.8.0-bin-x64>main.exe -m "%localappdata%\github.com.thewh1teagle.vibe\ggml-medium.bin" -f ..\samples_single.wav
whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\Cedric\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3070 Ti, compute capability 8.6, VMM: yes
whisper_model_load:    CUDA0 total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.68 MB
whisper_init_state: compute buffer (encode) =  594.22 MB
whisper_init_state: compute buffer (cross)  =    7.85 MB
whisper_init_state: compute buffer (decode) =  142.09 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0

main: processing '..\samples_single.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings:     load time =  1469.72 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    12.90 ms
whisper_print_timings:   sample time =    65.29 ms /   143 runs (    0.46 ms per run)
whisper_print_timings:   encode time =  1544.33 ms /     1 runs ( 1544.33 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time =   751.54 ms /   141 runs (    5.33 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  3863.29 ms

and vibe logs using the same wav file (CTD, same as MP3s):

[2024-05-31T13:07:25Z DEBUG vibe_desktop] Vibe App Running
[2024-05-31T13:07:25Z DEBUG vibe_desktop::setup] webview version: 119.0.2151.58
[2024-05-31T13:07:25Z DEBUG vibe_desktop::setup] CPU Features
    {"avx":{"enabled":false,"support":true},"avx2":{"enabled":false,"support":true},"f16c":{"enabled":false,"support":true},"fma":{"enabled":false,"support":true}}
[2024-05-31T13:07:25Z DEBUG vibe_desktop::setup] COMMIT_HASH: 4a44b1c79f287a6903c07128b976349839002d01
[2024-05-31T13:07:31Z DEBUG vibe::model] Transcribe called with {
      "path": "F:\\Downloads\\AI\\samples_single.wav",
      "model_path": "C:\\Users\\Cedric\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin",
      "lang": "en",
      "verbose": false,
      "n_threads": 4,
      "init_prompt": "",
      "temperature": 0.4,
      "translate": true
    }
[2024-05-31T13:07:31Z DEBUG vibe::audio] input is F:\Downloads\AI\samples_single.wav and output is C:\Users\Cedric\AppData\Local\Temp\.tmpxEIQYG.wav
[2024-05-31T13:07:31Z DEBUG vibe::audio::encoder] decoder channel layout is 0
[2024-05-31T13:07:31Z DEBUG vibe::audio::encoder] +-----------+
    |    in     |default--[16000Hz s16:mono]--Parsed_anull_0:default
    | (abuffer) |
    +-----------+

                                                       +---------------+
    Parsed_anull_0:default--[16000Hz s16:mono]--default|      out      |
                                                       | (abuffersink) |
                                                       +---------------+

                                           +----------------+
    in:default--[16000Hz s16:mono]--default| Parsed_anull_0 |default--[16000Hz s16:mono]--out:default
                                           |    (anull)     |
                                           +----------------+

[2024-05-31T13:07:31Z DEBUG vibe::audio] wav reader read from "C:\\Users\\Cedric\\AppData\\Local\\Temp\\.tmpxEIQYG.wav"
[2024-05-31T13:07:31Z DEBUG vibe::audio] parsing C:\Users\Cedric\AppData\Local\Temp\.tmpxEIQYG.wav
[2024-05-31T13:07:31Z DEBUG vibe::model] open model...
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\Cedric\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin'
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: use gpu    = 1
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: flash attn = 0
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: gpu_device = 0
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: dtw        = 0
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: loading model
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_vocab       = 51865
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_audio_ctx   = 1500
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_audio_state = 1024
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_audio_head  = 16
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_audio_layer = 24
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_text_ctx    = 448
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_text_state  = 1024
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_text_head   = 16
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_text_layer  = 24
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_mels        = 80
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: ftype         = 1
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: qntvr         = 0
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: type          = 4 (medium)
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: adding 1608 extra tokens
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_langs       = 99
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_backend_init: using CUDA backend
[2024-05-31T13:07:31Z INFO  whisper_rs::whisper_sys_log] whisper_model_load:    CUDA0 total size =  1533.14 MB
[2024-05-31T13:07:32Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: model size    = 1533.14 MB
[2024-05-31T13:07:32Z INFO  whisper_rs::whisper_sys_log] whisper_backend_init: using CUDA backend
[2024-05-31T13:07:32Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: kv self size  =  150.99 MB
[2024-05-31T13:07:32Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: kv cross size =  150.99 MB
[2024-05-31T13:07:32Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: kv pad  size  =    6.29 MB
[2024-05-31T13:07:32Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (conv)   =   28.68 MB
[2024-05-31T13:07:32Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (encode) =  594.22 MB
[2024-05-31T13:07:32Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (cross)  =    7.85 MB
[2024-05-31T13:07:32Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (decode) =  142.09 MB
[2024-05-31T13:07:32Z DEBUG vibe::model] set language to Some("en")
[2024-05-31T13:07:32Z DEBUG vibe::model] setting temperature to 0.4
[2024-05-31T13:07:32Z DEBUG vibe::model] setting init prompt to
[2024-05-31T13:07:32Z DEBUG vibe::model] setting n threads to 4
[2024-05-31T13:07:32Z DEBUG vibe::model] set start time...
[2024-05-31T13:07:32Z DEBUG vibe::model] setting state full...
[2024-05-31T13:07:32Z DEBUG vibe::model] progress callback 0
[2024-05-31T13:07:32Z DEBUG vibe_desktop::cmd] set_progress_bar 0

[Edit] Fix code format [Edit 2] Just realized I did not thank you for your answer, so here is it: Merci !!!

thewh1teagle commented 4 months ago

@clem0338

Looks good. If it works with the original Whisper.cpp, it likely means the issue is within Vibe It appears your current Vibe version is a bit outdated, even though it shows as the latest version. Try to install vibe_2.0.0_x64-setup_nvidia.exe Additionally, before starting the transcription in Vibe, open the advanced options (in the main window, they might be collapsed) and set the temperature to 0 (also disable the translate).

clem0338 commented 4 months ago

@clem0338

Looks good. If it works with the original Whisper.cpp, it likely means the issue is within Vibe It appears your current Vibe version is a bit outdated, even though it shows as the latest version. Try to install vibe_2.0.0_x64-setup_nvidia.exe Additionally, before starting the transcription in Vibe, open the advanced options (in the main window, they might be collapsed) and set the temperature to 0 (also disable the translate).

First of all, I'm already using the latest version, installed from the exact same link you sent me

I also tested with temperature == 0 and it crashes the same. One thing to mention, if I take the WAV file created by the app (in %temp% folder) and use it with whisper CLI, it works like a charm

thewh1teagle commented 4 months ago

First of all, I'm already using the latest version, installed from the exact same link you sent me

I've updated the nvidia setup file in that link.

Maybe it's something related to the desktop app; meanwhile I added the option to use Vibe as CLI so we can know if it's something related to the UI.

You can download it from vibe_2.0.0_x64-setup_nvidia.exe and then execute:

cd %userprofile%\desktop
mkdir test
cd test
curl -L "https://github.com/thewh1teagle/vibe/raw/main/samples/short.wav" -o short.wav
set RUST_LOG=vibe=debug,whisper_rs=debug
%localappdata%\vibe\vibe.exe --model ggml-medium.bin --file short.wav

It will navigate to desktop folder, download example wav file and try to transcribe.

thewh1teagle commented 4 months ago

In addition for adding CLI, in latest release I provide 2 versions of nvidia, one is 11 and the second is 12. Maybe you need the older one

vibe/releases/latest

https://github.com/thewh1teagle/vibe/issues/87#issuecomment-2147205324

clem0338 commented 4 months ago

Sorry about the delay, I was away for a few days.

I confirm the latest version (vibe_2.0.1_x64-setup_nvidia_v12.2.exe) is working on both computers. Did not even have to try the CLI mod.

Thanks for this nice tool.

thewh1teagle commented 4 months ago

Great to hear that it works fine with latest release! It should work also with vibe_2.0.1_x64-setup_nvidia_v12.5.exe so I'll close this issue, but if you get a chance to test it too, it will be great! thanks :)