Closed clem0338 closed 4 months ago
Thanks for reporting this issue!
It's challenging to pinpoint the specific cause from the event log error. However, we can try using the original whisper.cpp and compare the results. The issue might be related to a version mismatch with the NVIDIA libraries.
Try the following:
Ctrl
+ l
in explorer, type cmd
and entermain.exe -m "%localappdata%\github.com.thewh1teagle.vibe\ggml-medium.bin" -f "samples_single.wav"
In addition to get more insights from Vibe itself, you can run it from cmd.exe
with:
set RUST_BACKTRACE=1
set RUST_LOG=vibe=debug,whisper_rs=debug
%localappdata%\vibe\vibe.exe
Result from 12.2.0 version (looks like working without crashing)
F:\Downloads\AI\whisper-cublas-12.2.0-bin-x64>main.exe -m "%localappdata%\github.com.thewh1teagle.vibe\ggml-medium.bin" -f ..\samples_single.wav
whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\Cedric\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_backend_init: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3070 Ti, compute capability 8.6, VMM: yes
whisper_model_load: model size = 1533.14 MB3.14 MB
whisper_init_state: kv self size = 150.99 MB
whisper_init_state: kv cross size = 150.99 MB
whisper_init_state: kv pad size = 6.29 MB
whisper_init_state: compute buffer (conv) = 28.68 MB
whisper_init_state: compute buffer (encode) = 594.22 MB
whisper_init_state: compute buffer (cross) = 7.85 MB
whisper_init_state: compute buffer (decode) = 142.09 MB
system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0
main: processing 'F:\Downloads\Electronic\CAN\samples_single.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
whisper_print_timings: load time = 6719.37 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 11.65 ms
whisper_print_timings: sample time = 79.58 ms / 143 runs ( 0.56 ms per run)
whisper_print_timings: encode time = 1916.62 ms / 1 runs ( 1916.62 ms per run)
whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: batchd time = 594.36 ms / 141 runs ( 4.22 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 9356.77 ms
same for 11.8.0
F:\Downloads\AI\whisper-cublas-11.8.0-bin-x64>main.exe -m "%localappdata%\github.com.thewh1teagle.vibe\ggml-medium.bin" -f ..\samples_single.wav
whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\Cedric\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_backend_init: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3070 Ti, compute capability 8.6, VMM: yes
whisper_model_load: CUDA0 total size = 1533.14 MB
whisper_model_load: model size = 1533.14 MB
whisper_init_state: kv self size = 150.99 MB
whisper_init_state: kv cross size = 150.99 MB
whisper_init_state: kv pad size = 6.29 MB
whisper_init_state: compute buffer (conv) = 28.68 MB
whisper_init_state: compute buffer (encode) = 594.22 MB
whisper_init_state: compute buffer (cross) = 7.85 MB
whisper_init_state: compute buffer (decode) = 142.09 MB
system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0
main: processing '..\samples_single.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
whisper_print_timings: load time = 1469.72 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 12.90 ms
whisper_print_timings: sample time = 65.29 ms / 143 runs ( 0.46 ms per run)
whisper_print_timings: encode time = 1544.33 ms / 1 runs ( 1544.33 ms per run)
whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: batchd time = 751.54 ms / 141 runs ( 5.33 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 3863.29 ms
and vibe logs using the same wav file (CTD, same as MP3s):
[2024-05-31T13:07:25Z DEBUG vibe_desktop] Vibe App Running
[2024-05-31T13:07:25Z DEBUG vibe_desktop::setup] webview version: 119.0.2151.58
[2024-05-31T13:07:25Z DEBUG vibe_desktop::setup] CPU Features
{"avx":{"enabled":false,"support":true},"avx2":{"enabled":false,"support":true},"f16c":{"enabled":false,"support":true},"fma":{"enabled":false,"support":true}}
[2024-05-31T13:07:25Z DEBUG vibe_desktop::setup] COMMIT_HASH: 4a44b1c79f287a6903c07128b976349839002d01
[2024-05-31T13:07:31Z DEBUG vibe::model] Transcribe called with {
"path": "F:\\Downloads\\AI\\samples_single.wav",
"model_path": "C:\\Users\\Cedric\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin",
"lang": "en",
"verbose": false,
"n_threads": 4,
"init_prompt": "",
"temperature": 0.4,
"translate": true
}
[2024-05-31T13:07:31Z DEBUG vibe::audio] input is F:\Downloads\AI\samples_single.wav and output is C:\Users\Cedric\AppData\Local\Temp\.tmpxEIQYG.wav
[2024-05-31T13:07:31Z DEBUG vibe::audio::encoder] decoder channel layout is 0
[2024-05-31T13:07:31Z DEBUG vibe::audio::encoder] +-----------+
| in |default--[16000Hz s16:mono]--Parsed_anull_0:default
| (abuffer) |
+-----------+
+---------------+
Parsed_anull_0:default--[16000Hz s16:mono]--default| out |
| (abuffersink) |
+---------------+
+----------------+
in:default--[16000Hz s16:mono]--default| Parsed_anull_0 |default--[16000Hz s16:mono]--out:default
| (anull) |
+----------------+
[2024-05-31T13:07:31Z DEBUG vibe::audio] wav reader read from "C:\\Users\\Cedric\\AppData\\Local\\Temp\\.tmpxEIQYG.wav"
[2024-05-31T13:07:31Z DEBUG vibe::audio] parsing C:\Users\Cedric\AppData\Local\Temp\.tmpxEIQYG.wav
[2024-05-31T13:07:31Z DEBUG vibe::model] open model...
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\Cedric\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin'
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: use gpu = 1
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: flash attn = 0
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: gpu_device = 0
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: dtw = 0
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: loading model
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_vocab = 51865
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_ctx = 1500
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_state = 1024
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_head = 16
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_layer = 24
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_ctx = 448
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_state = 1024
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_head = 16
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_layer = 24
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_mels = 80
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: ftype = 1
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: qntvr = 0
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: type = 4 (medium)
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: adding 1608 extra tokens
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_langs = 99
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_backend_init: using CUDA backend
[2024-05-31T13:07:31Z INFO whisper_rs::whisper_sys_log] whisper_model_load: CUDA0 total size = 1533.14 MB
[2024-05-31T13:07:32Z INFO whisper_rs::whisper_sys_log] whisper_model_load: model size = 1533.14 MB
[2024-05-31T13:07:32Z INFO whisper_rs::whisper_sys_log] whisper_backend_init: using CUDA backend
[2024-05-31T13:07:32Z INFO whisper_rs::whisper_sys_log] whisper_init_state: kv self size = 150.99 MB
[2024-05-31T13:07:32Z INFO whisper_rs::whisper_sys_log] whisper_init_state: kv cross size = 150.99 MB
[2024-05-31T13:07:32Z INFO whisper_rs::whisper_sys_log] whisper_init_state: kv pad size = 6.29 MB
[2024-05-31T13:07:32Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (conv) = 28.68 MB
[2024-05-31T13:07:32Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (encode) = 594.22 MB
[2024-05-31T13:07:32Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (cross) = 7.85 MB
[2024-05-31T13:07:32Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (decode) = 142.09 MB
[2024-05-31T13:07:32Z DEBUG vibe::model] set language to Some("en")
[2024-05-31T13:07:32Z DEBUG vibe::model] setting temperature to 0.4
[2024-05-31T13:07:32Z DEBUG vibe::model] setting init prompt to
[2024-05-31T13:07:32Z DEBUG vibe::model] setting n threads to 4
[2024-05-31T13:07:32Z DEBUG vibe::model] set start time...
[2024-05-31T13:07:32Z DEBUG vibe::model] setting state full...
[2024-05-31T13:07:32Z DEBUG vibe::model] progress callback 0
[2024-05-31T13:07:32Z DEBUG vibe_desktop::cmd] set_progress_bar 0
[Edit] Fix code format [Edit 2] Just realized I did not thank you for your answer, so here is it: Merci !!!
@clem0338
Looks good. If it works with the original Whisper.cpp, it likely means the issue is within Vibe It appears your current Vibe version is a bit outdated, even though it shows as the latest version. Try to install vibe_2.0.0_x64-setup_nvidia.exe Additionally, before starting the transcription in Vibe, open the advanced options (in the main window, they might be collapsed) and set the temperature to 0 (also disable the translate).
@clem0338
Looks good. If it works with the original Whisper.cpp, it likely means the issue is within Vibe It appears your current Vibe version is a bit outdated, even though it shows as the latest version. Try to install vibe_2.0.0_x64-setup_nvidia.exe Additionally, before starting the transcription in Vibe, open the advanced options (in the main window, they might be collapsed) and set the temperature to 0 (also disable the translate).
First of all, I'm already using the latest version, installed from the exact same link you sent me
I also tested with temperature == 0 and it crashes the same. One thing to mention, if I take the WAV file created by the app (in %temp% folder) and use it with whisper CLI, it works like a charm
First of all, I'm already using the latest version, installed from the exact same link you sent me
I've updated the nvidia setup file in that link.
Maybe it's something related to the desktop app; meanwhile I added the option to use Vibe as CLI
so we can know if it's something related to the UI.
You can download it from vibe_2.0.0_x64-setup_nvidia.exe and then execute:
cd %userprofile%\desktop
mkdir test
cd test
curl -L "https://github.com/thewh1teagle/vibe/raw/main/samples/short.wav" -o short.wav
set RUST_LOG=vibe=debug,whisper_rs=debug
%localappdata%\vibe\vibe.exe --model ggml-medium.bin --file short.wav
It will navigate to desktop folder, download example wav
file and try to transcribe.
In addition for adding CLI, in latest release I provide 2 versions of nvidia, one is 11 and the second is 12. Maybe you need the older one
https://github.com/thewh1teagle/vibe/issues/87#issuecomment-2147205324
Sorry about the delay, I was away for a few days.
I confirm the latest version (vibe_2.0.1_x64-setup_nvidia_v12.2.exe) is working on both computers. Did not even have to try the CLI mod.
Thanks for this nice tool.
Great to hear that it works fine with latest release! It should work also with vibe_2.0.1_x64-setup_nvidia_v12.5.exe so I'll close this issue, but if you get a chance to test it too, it will be great! thanks :)
What happened?
On NVIDIA version,
An MP3 transcription crashes the app to desktop on my workstation (Win11, RTX 3070 With 8Gb, 32Gb RAM) but works flawlessly on my laptop (Win11, GTX1650 with 4Gb, 16Bg RAM)
Here is the EventViewer crash log:
log
``` - System - Provider [ Name] Application Error [ Guid] {a0e9b465-b939-57d7-b27d-95d8e925ff57} EventID 1000 Version 0 Level 2 Task 100 Opcode 0 Keywords 0x8000000000000000 - TimeCreated [ SystemTime] 2024-05-31T08:00:15.3838683Z EventRecordID 3506 Correlation - Execution [ ProcessID] 3204 [ ThreadID] 3352 Channel Application Computer DESKTOP-0D5MR4P - Security [ UserID] S-1-5-21-3684703487-1378115604-541973499-1001 - EventData AppName vibe.exe AppVersion 2.0.0.0 AppTimeStamp 66568d1b ModuleName ucrtbase.dll ModuleVersion 10.0.26217.5000 ModuleTimeStamp 53f1888b ExceptionCode c0000409 FaultingOffset 00000000000a514e ProcessId 0x48f4 ProcessCreationTime 0x1dab33021395435 AppPath F:\Downloads\AI\vibe_2.0.0_x64-setup_nvidia\vibe.exe ModulePath C:\WINDOWS\System32\ucrtbase.dll IntegratorReportId 585e91ac-adfa-4a77-a141-9b761057ed9a PackageFullName PackageRelativeAppId ```Steps to reproduce
What OS are you seeing the problem on?
Window
Relevant log output
log
```shell App Version: 2.0.0 Commit Hash: 4a44b1c79f287a6903c07128b976349839002d01 Arch: x86_64 Platform: windows Kernel Version: 10.0.26217 OS: windows OS Version: 10.0.26217 Models: ggml-medium.bin Default Model: "C:\\Users\\Cedric\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin" { "avx": { "enabled": true, "support": true }, "avx2": { "enabled": true, "support": true }, "f16c": { "enabled": true, "support": true }, "fma": { "enabled": treu, "support": true } } ```Edit: