sandrohanea / whisper.net

Whisper.net. Speech to text made simple using Whisper Models
MIT License
506 stars 77 forks source link

Is there a way to suppress Console logging? #129

Closed danroot closed 7 months ago

danroot commented 8 months ago

I have a working example of transcoding mic input in real time, but see lots of whisper logging in the console that I would rather not see, like this: whisper_init_state: kv self size = 5.25 MB whisper_init_state: kv cross size = 17.58 MB whisper_init_state: loading Core ML model from 'ggml-base-encoder.mlmodelc' whisper_init_state: first run on a device may take a while ... whisper_init_state: Core ML model loaded Is there some way to suppress this logging?

sandrohanea commented 8 months ago

Those logs are written by whisper.cpp library: https://github.com/ggerganov/whisper.cpp and currently there is no way of suppressing them.

I recommend you opening an issue there and if some config / option is added, will be added to this library as well.

danroot commented 8 months ago

Opened this issue: https://github.com/ggerganov/whisper.cpp/issues/1448 My pinvoke-fu is pretty weak, but I'm thinking something like whisper_set_log_callback(nil), or if there is some way to set the callback to a .NET method, we could intercept whisper.cpp output to parse & debug, but not show it unless we wanted.

sandrohanea commented 7 months ago

Logging is now suppresed by default, you can register some log handler to the LogProvider if you want to show that logs or use them in your app.

contractorwolf commented 6 months ago

is that true? I just downloaded it today and I get a ton of logs that I came looking to stop:

MacBook-Pro-3:whisper.cpp jameswolf$ ./main -m models/ggml-base.en.bin -f /Users/jameswolf/Desktop/redteam/audio/wake_word_detected16k.wav whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin' whisper_model_load: loading model whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 512 whisper_model_load: n_audio_head = 8 whisper_model_load: n_audio_layer = 6 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 512 whisper_model_load: n_text_head = 8 whisper_model_load: n_text_layer = 6 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 1 whisper_model_load: qntvr = 0 whisper_model_load: type = 2 (base) whisper_model_load: adding 1607 extra tokens whisper_model_load: n_langs = 99 whisper_backend_init: using Metal backend ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 Pro ggml_metal_init: picking default device: Apple M1 Pro ggml_metal_init: default.metallib not found, loading from source ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil ggml_metal_init: loading '/Users/jameswolf/Desktop/redteam/whisper.cpp/ggml-metal.metal' ggml_metal_init: GPU name: Apple M1 Pro ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB ggml_metal_init: maxTransferRate = built-in GPU ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 140.64 MiB, ( 142.27 / 21845.34) whisper_model_load: Metal buffer size = 147.46 MB whisper_model_load: model size = 147.37 MB whisper_backend_init: using Metal backend ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 Pro ggml_metal_init: picking default device: Apple M1 Pro ggml_metal_init: default.metallib not found, loading from source ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil ggml_metal_init: loading '/Users/jameswolf/Desktop/redteam/whisper.cpp/ggml-metal.metal' ggml_metal_init: GPU name: Apple M1 Pro ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB ggml_metal_init: maxTransferRate = built-in GPU ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 15.75 MiB, ( 158.02 / 21845.34) whisper_init_state: kv self size = 16.52 MB ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 17.58 MiB, ( 175.59 / 21845.34) whisper_init_state: kv cross size = 18.43 MB ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 0.02 MiB, ( 175.61 / 21845.34) whisper_init_state: compute buffer (conv) = 14.86 MB ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 0.02 MiB, ( 175.62 / 21845.34) whisper_init_state: compute buffer (encode) = 85.99 MB ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 0.02 MiB, ( 175.64 / 21845.34) whisper_init_state: compute buffer (cross) = 4.78 MB ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 0.02 MiB, ( 175.66 / 21845.34) whisper_init_state: compute buffer (decode) = 96.48 MB ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 12.55 MiB, ( 188.19 / 21845.34) ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 80.39 MiB, ( 268.56 / 21845.34) ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 2.94 MiB, ( 271.48 / 21845.34) ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 90.39 MiB, ( 361.86 / 21845.34)

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |

main: processing '/Users/jameswolf/Desktop/redteam/audio/wake_word_detected16k.wav' (191700 samples, 12.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:09.620] Hello Red Team, can you hear me? [00:00:09.620 --> 00:00:17.820] [BLANK_AUDIO]

whisper_print_timings: load time = 131.77 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 8.45 ms whisper_print_timings: sample time = 26.64 ms / 88 runs ( 0.30 ms per run) whisper_print_timings: encode time = 199.98 ms / 2 runs ( 99.99 ms per run) whisper_print_timings: decode time = 22.69 ms / 5 runs ( 4.54 ms per run) whisper_print_timings: batchd time = 83.17 ms / 75 runs ( 1.11 ms per run) whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run) whisper_print_timings: total time = 477.88 ms ggml_metal_free: deallocating ggml_metal_free: deallocating

I would prefer if it just output the audio text:

[00:00:00.000 --> 00:00:09.620] Hello Red Team, can you hear me? [00:00:09.620 --> 00:00:17.820] [BLANK_AUDIO]

it also seems to imaging 5 seconds since the audio is only 12 seconds long (and it says 17 seconds out to [BLANK_AUDIO])