When transcribing with nvidia 12.5 on Windows 11 it uses the nvidia GPU only for few seconds and only for 1-2% and then it only uses the CPU / Intel GPU.
Laptop model: C340-15IM
Microsoft Windows [Version 10.0.22631.3672]
(c) Microsoft Corporation. All rights reserved.
C:\Users\משפחה\Desktop\test>set RUST_LOG=vibe=debug,whisper_rs=debug
C:\Users\משפחה\Desktop\test>%localappdata%\vibe\vibe.exe --model ggml-medium.bin --file short.wav --language english
C:\Users\משפחה\Desktop\test>[2024-06-05T05:29:50Z DEBUG vibe_desktop] Vibe App Running
[2024-06-05T05:29:50Z DEBUG vibe_desktop::setup] webview version: 125.0.2535.79
[2024-06-05T05:29:50Z DEBUG vibe_desktop::setup] CPU Features
{"avx":{"enabled":true,"support":true},"avx2":{"enabled":true,"support":true},"f16c":{"enabled":true,"support":true},"fma":{"enabled":true,"support":true}}
[2024-06-05T05:29:50Z DEBUG vibe_desktop::setup] COMMIT_HASH: bbbb0d102f86e71b7e4304a67798024a712ea63e
Transcribe... 🔄
[2024-06-05T05:29:50Z DEBUG vibe::model] Transcribe called with {
"path": "short.wav",
"model_path": "C:\\Users\\משפחה\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin",
"lang": "en",
"verbose": false,
"n_threads": 4,
"init_prompt": null,
"temperature": 0.4,
"translate": null
}
[2024-06-05T05:29:50Z DEBUG vibe::audio] input is short.wav and output is C:\Users\F3F6~1\AppData\Local\Temp\.tmpTwC2sv.wav
[2024-06-05T05:29:50Z DEBUG vibe::audio::encoder] decoder channel layout is 0
[2024-06-05T05:29:50Z DEBUG vibe::audio] wav reader read from "C:\\Users\\F3F6~1\\AppData\\Local\\Temp\\.tmpTwC2sv.wav"
[2024-06-05T05:29:50Z DEBUG vibe::audio] parsing C:\Users\F3F6~1\AppData\Local\Temp\.tmpTwC2sv.wav
[2024-06-05T05:29:50Z DEBUG vibe::model] open model...
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\משפחה\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin'
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: use gpu = 1
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: flash attn = 0
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: gpu_device = 0
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: dtw = 0
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: loading model
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_vocab = 51865
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_ctx = 1500
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_state = 1024
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_head = 16
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_layer = 24
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_ctx = 448
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_state = 1024
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_head = 16
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_layer = 24
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_mels = 80
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: ftype = 1
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: qntvr = 0
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: type = 4 (medium)
[2024-06-05T05:29:51Z INFO whisper_rs::whisper_sys_log] whisper_model_load: adding 1608 extra tokens
[2024-06-05T05:29:51Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_langs = 99
[2024-06-05T05:29:51Z INFO whisper_rs::whisper_sys_log] whisper_backend_init: using CUDA backend
[2024-06-05T05:29:51Z INFO whisper_rs::whisper_sys_log] whisper_model_load: CUDA0 total size = 1533.14 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_model_load: model size = 1533.14 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_backend_init: using CUDA backend
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: kv self size = 150.99 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: kv cross size = 150.99 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: kv pad size = 6.29 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (conv) = 28.68 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (encode) = 594.22 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (cross) = 7.85 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (decode) = 142.09 MB
[2024-06-05T05:29:57Z DEBUG vibe::model] set language to Some("en")
[2024-06-05T05:29:58Z DEBUG vibe::model] setting temperature to 0.4
[2024-06-05T05:29:58Z DEBUG vibe::model] setting n threads to 4
[2024-06-05T05:29:58Z DEBUG vibe::model] set start time...
[2024-06-05T05:29:58Z DEBUG vibe::model] setting state full...
[2024-06-05T05:30:15Z DEBUG vibe::model] getting segments count...
[2024-06-05T05:30:15Z DEBUG vibe::model] found 1 segments
[2024-06-05T05:30:15Z DEBUG vibe::model] looping segments...
1
0:00:00,000 --> 0:00:01,600
Experience proves this.
Transcription completed in 25.1s ⏱️
Done ✅
What happened?
When transcribing with nvidia 12.5 on Windows 11 it uses the nvidia GPU only for few seconds and only for 1-2% and then it only uses the CPU / Intel GPU.
Laptop model:
C340-15IM
Steps to reproduce
What OS are you seeing the problem on?
No response
Relevant log output
No response
Related: https://github.com/thewh1teagle/vibe/issues/87