thewh1teagle / vibe

Transcribe on your own!
https://thewh1teagle.github.io/vibe/
MIT License
1.05k stars 65 forks source link

v2.5.0 - feedback #241

Closed thewh1teagle closed 1 month ago

thewh1teagle commented 2 months ago

v2.5.0-beta.0 Feedback

Please share your thoughts and experiences with the new Vulkan support! Your feedback will help us optimize performance across different systems.

Release details: v2.5.0-beta.0

Y-PLONI commented 2 months ago

Will it help you to check also on a simple Intel processor? [In Windows, graphics processor UHD Graphics 620].

thewh1teagle commented 2 months ago

Will it help you to check also on a simple Intel processor? [In Windows, graphics processor UHD Graphics 620].

Yes, it will definitely help, I have uploaded the version that is suitable for old processors if needed

altunenes commented 2 months ago

Which one is faster in your own setup? cuda or vulkan? Maybe we should create a test (https://github.com/thewh1teagle/pyannote) here or somewhere else to make a better comparison.

Vulkan is really good. Maybe CUDA is a bit ahead in terms of speed, but I like this simplicity more. And I wonder how much it affects the quality. I didn't notice any difference in my tests.

altunenes commented 2 months ago

Also I tried vulkan with my onboard amd gpu and it works really nice. imagine we have onnx runtime with vulkan support... :)

thewh1teagle commented 2 months ago

Which one is faster in your own setup? cuda or vulkan?

I didn't compared but I from what I remember it's similar like 2 minutes for 1 hour.

Vulkan is really good. Maybe CUDA is a bit ahead in terms of speed, but I like this simplicity more. And I wonder how much it affects the quality. I didn't notice any difference in my tests.

cuda was a bit faster, but it's quite inconvenient to use. The setup involves a massive amount of configuration on GitHub Actions and results in a 300 MB binary size. Moreover, it only works with NVIDIA GPUs.

Now, with the new approach, we have a regular 25 MB executable on Windows and a single lightweight .deb package on Linux, which is automatically installed via apt. This new method supports almost every GPU, which is a significant advantage for running AI models. CoreML for macOS and Vulkan for other platforms offer great potential for a wide range of applications.

Also I tried vulkan with my onboard amd gpu and it works really nice. imagine we have onnx runtime with vulkan support... :)

That's one reason I prefer to stick with ggml instead of onnxruntime. fewer headaches. By the way, onnxruntime has some mysterious bugs and seems to still be a bit unpolished.

altunenes commented 2 months ago

Here is the simple test code for the comparison that I wrote some time ago. This gives me a bad quality of transcription (I dont know why, I probably make some errors in the resampling process or other areas, feel free to comment/feedback) but it at least gives me some of an impression of speed.

test audio: 01:46 min English audio file between 2 nonnative English speakers (Unfortunately I can't share it because it does not belong to me), RTX 3060 Notebook GPU:

Vulkan

medium model:

Processing time: 25 seconds

Small Model:

Processing time: 12 seconds

CUDA medium model:

Processing time: 13 seconds

Small Model: Processing time: 8 seconds

While CUDA is working really fast, Considering it's limited to NVIDIA graphics cards, I think it makes more sense to stick with VULKAN. 🙂

test code:

```rust use eyre::{Result, eyre, bail, Context}; use pyannote_rs::{EmbeddingExtractor, EmbeddingManager}; use whisper_rs::{FullParams, SamplingStrategy, WhisperContext, WhisperContextParameters}; use std::time::Instant; use std::fs::File; use std::io::Write; use std::path::Path; use std::panic::{catch_unwind, AssertUnwindSafe}; use eyre::OptionExt; use rubato::{Resampler, SincFixedIn, SincInterpolationParameters, SincInterpolationType, WindowFunction}; use hound; fn resample(input: &[f32], input_sample_rate: usize, output_sample_rate: usize) -> Result> { let params = SincInterpolationParameters { sinc_len: 256, f_cutoff: 0.95, interpolation: SincInterpolationType::Linear, oversampling_factor: 160, window: WindowFunction::BlackmanHarris2, }; let mut resampler = SincFixedIn::::new( output_sample_rate as f64 / input_sample_rate as f64, 2.0, params, input.len(), 1, )?; let waves_in: Vec> = vec![input.to_vec()]; let waves_out = resampler.process(&waves_in, None)?; Ok(waves_out[0].clone()) } fn read_audio_file(path: &str) -> Result<(i32, Vec)> { let mut reader = hound::WavReader::open(path)?; let spec = reader.spec(); let sample_rate = spec.sample_rate as i32; let channels = spec.channels as usize; if channels > 1 { println!("Non-mono audio detected. Converting to mono..."); } let samples: Vec = match spec.sample_format { hound::SampleFormat::Int => { match spec.bits_per_sample { 16 => reader.samples::().map(|s| s.unwrap() as f32 / i16::MAX as f32).collect(), 24 => reader.samples::().map(|s| s.unwrap() as f32 / 8388608.0).collect(), 32 => reader.samples::().map(|s| s.unwrap() as f32 / i32::MAX as f32).collect(), _ => bail!("Unsupported bits per sample: {}", spec.bits_per_sample), } }, hound::SampleFormat::Float => reader.samples::().map(|s| s.unwrap()).collect(), }; let mono_samples = if channels > 1 { samples.chunks(channels).map(|chunk| chunk.iter().sum::() / channels as f32).collect() } else { samples }; Ok((sample_rate, mono_samples)) } fn create_whisper_context(model_path: &str) -> Result { println!("Opening model..."); let model_path = Path::new(model_path); if !model_path.exists() { bail!("Whisper model file doesn't exist"); } let mut ctx_params = WhisperContextParameters::default(); if std::env::var("CUDA_VERSION").is_ok() || std::env::var("ROCM_VERSION").is_ok() { ctx_params.use_gpu = true; } println!("Use GPU: {:?}", ctx_params.use_gpu); let model_path = model_path.to_str().ok_or_eyre("Can't convert model path to string")?; println!("Creating Whisper context with model path {}", model_path); let ctx_result = catch_unwind(AssertUnwindSafe(|| { WhisperContext::new_with_params(model_path, ctx_params) })); match ctx_result { Err(error) => { bail!("Create Whisper context crashed: {:?}", error) } Ok(ctx) => { println!("Created context successfully"); Ok(ctx?) } } } fn setup_whisper_params() -> FullParams<'static, 'static> { let mut params = FullParams::new(SamplingStrategy::default()); params.set_print_special(false); params.set_print_progress(true); params.set_print_realtime(false); params.set_print_timestamps(true); params.set_suppress_blank(true); params.set_token_timestamps(true); params.set_language(Some("en")); params.set_n_threads(4); params.set_translate(false); params.set_no_context(false); params.set_single_segment(false); params.set_split_on_word(true); params.set_max_tokens(0); params.set_temperature(0.0); params } fn transcribe( ctx: &WhisperContext, audio_path: &str, output_path: &str, diarize_options: Option, ) -> Result<()> { println!("Transcribe called for {}", audio_path); let (sample_rate, original_samples) = read_audio_file(audio_path)?; let whisper_samples = if sample_rate != 16000 { println!("Resampling audio from {} Hz to 16000 Hz", sample_rate); resample(&original_samples, sample_rate as usize, 16000)? } else { original_samples }; let mut state = ctx.create_state().context("Failed to create state")?; let params = setup_whisper_params(); let mut output_file = File::create(output_path)?; let start_time = Instant::now(); if let Some(diarize_options) = diarize_options { let i16_samples: Vec = whisper_samples.iter() .map(|&x| (x * i16::MAX as f32) as i16) .collect(); let segments = pyannote_rs::segment(&i16_samples, 16000, &diarize_options.segment_model_path)?; let mut embedding_extractor = EmbeddingExtractor::new(&diarize_options.embedding_model_path)?; let mut embedding_manager = EmbeddingManager::new(diarize_options.max_speakers); let min_segment_duration = 1.0; let mut combined_segments = Vec::new(); let mut current_segment = segments[0].clone(); for segment in segments.iter().skip(1) { if current_segment.end - current_segment.start < min_segment_duration { // Combine with the next segment current_segment.end = segment.end; current_segment.samples.extend_from_slice(&segment.samples); } else { combined_segments.push(current_segment); current_segment = segment.clone(); } } combined_segments.push(current_segment); for (i, segment) in combined_segments.iter().enumerate() { let start_sample = (segment.start * 16000.0) as usize; let end_sample = (segment.end * 16000.0) as usize; let mut segment_samples = whisper_samples[start_sample..end_sample].to_vec(); if segment_samples.len() < 16000 { segment_samples.extend(vec![0.0; 16000 - segment_samples.len()]); } state.full(params.clone(), &segment_samples)?; let text = state.full_get_segment_text(0)?; let embedding_result: Vec = embedding_extractor.compute(&segment.samples)?.collect(); let speaker = if embedding_manager.get_all_speakers().len() == diarize_options.max_speakers { embedding_manager .get_best_speaker_match(embedding_result) .map(|r| r.to_string()) .unwrap_or_else(|_| "Unknown".to_string()) } else { embedding_manager .search_speaker(embedding_result, diarize_options.threshold) .map(|r| r.to_string()) .unwrap_or_else(|| "Unknown".to_string()) }; writeln!( output_file, "{};{};{:.2};{:.2};{:.2}", speaker, text, segment.start, segment.end, segment.end - segment.start )?; println!( "Segment {}: start = {:.2}, end = {:.2}, speaker = {}, text = {}", i + 1, segment.start, segment.end, speaker, text ); } } else { state.full(params, &whisper_samples)?; let num_segments = state.full_n_segments()?; for s in 0..num_segments { let text = state.full_get_segment_text(s)?; let start = state.full_get_segment_t0(s)?; let stop = state.full_get_segment_t1(s)?; writeln!( output_file, "{};{:.2};{:.2};{:.2}", text, start as f64 / 100.0, stop as f64 / 100.0, (stop - start) as f64 / 100.0 )?; println!( "Segment {}: start = {:.2}, end = {:.2}, text = {}", s, start as f64 / 100.0, stop as f64 / 100.0, text ); } } let processing_time = start_time.elapsed().as_secs(); println!("Processing time: {} seconds", processing_time); println!("Output written to: {}", output_path); Ok(()) } #[derive(Debug)] struct DiarizeOptions { segment_model_path: String, embedding_model_path: String, threshold: f32, max_speakers: usize, } fn main() -> Result<()> { let audio_path = "samples/nmj.wav"; let output_path = "transcription_output_vulkan.txt"; let diarize_options = Some(DiarizeOptions { segment_model_path: "segmentation-3.0.onnx".to_string(), embedding_model_path: "wespeaker_en_voxceleb_CAM++.onnx".to_string(), threshold: 0.5, max_speakers: 2, }); let whisper_model_path = "ggml-medium.bin"; let ctx = create_whisper_context(whisper_model_path)?; transcribe(&ctx, audio_path, output_path, diarize_options)?; Ok(()) } ```
Model Processing Method Processing Time (for 01:46 min audio)
Medium Vulkan 25 seconds
Small Vulkan 12 seconds
Medium CUDA 13 seconds
Small CUDA 8 seconds
Danthig commented 2 months ago

v2.5.0-beta.0 Feedback

Please share your thoughts and experiences with the new Vulkan support! Your feedback will help us optimize performance across different systems.

Release details: v2.5.0-beta.0

תודה רבה על התוכנה המיוחדת! התוכנה הטובה ביותר לתמלול! ברורה יפה ומהירה ביותר.

משום מה גם גרסת הבטא, וגם הגרסה ששוחררה https://github.com/thewh1teagle/vibe/releases/tag/v2.5.0 לא עובדים במחשב שלי! לאחר שאני בוחר קובץ לתמלול, ומפעיל התוכנה קורסת ללא שום הודעת באג!

Y-PLONI commented 2 months ago

התוכנה קורסת ללא שום הודעת באג!

נסה להפעיל את הגרסה למעבדים ישנים.

altunenes commented 2 months ago

additional discussion about parallel speech if anyone is interested 😊:

https://github.com/thewh1teagle/vibe/discussions/233#discussioncomment-10444719

adlihm commented 2 months ago

Hm, I don't know if it's just me, but the latest version couldn't transcript any files I use, even though I tried to tinker with the setting the second time I tried it again. I usually use Vibe to transcribe my podcast to Indonesian, yet tonight, both attempts (3 files, 2 times), doesn't produce the result that I was expecting. It's so bad, and I have no idea why.

Danthig commented 2 months ago

התוכנה קורסת ללא שום הודעת באג!

נסה להפעיל את הגרסה למעבדים ישנים.

@thewh1teagle גם זה לא עזר! תוכל לראות בגיף המצורף, שvibe אינה עובדת על cpu, אלא רק על gpu, גם כשהגדרתי מספר התקן GPU 1, וגם 2 וגם כשהוא היה כבוי לגמרי. כאשר בפעם הראשונה שאני מפעיל את VIBE ופונקציית ביצועי גרפיקה גבוהים כבויה, התוכנה קורסת ללא הודעת באג. ורק כאשר אני בוחר בהתקן GPU 0 או 2, התוכנה עובדת על GPU.

מעבד I7-1165G7 ומעבד גרפי: אינטל איריס XE. התוכנה אינה עובדת על CPU כלל

altunenes commented 2 months ago

התוכנה קורסת ללא שום הודעת באג!

Which version did you download? Have you tried the VULKAN version?

altunenes commented 2 months ago

As a result of my experiments, I now have a better understanding of the behavior of these models. The beginning of the audio files is so critical that if something like a different noise is introduced at the beginning of the video, the “identification” drops significantly.

For example, when I upload an audio file where two people is speaking with a small echo (indistinct due to human voice) at the beginning, the accuracy of the identification drops to around 50% (transcription is still fine). However, when I try to trim this part (removing the initial 1 second), the accuracy reaches up to 90%.

thewh1teagle commented 2 months ago

As a result of my experiments, I now have a better understanding of the behavior of these models. The beginning of the audio files is so critical that if something like a different noise is introduced at the beginning of the video, the “identification” drops significantly.

For example, when I upload an audio file where two people is speaking with a small echo (indistinct due to human voice) at the beginning, the accuracy of the identification drops to around 50% (transcription is still fine). However, when I try to trim this part (removing the initial 1 second), the accuracy reaches up to 90%.

Interesting. You're talking about the diarization models right? (the segmentation and embedding) If so, maybe we can always add 1s of silent padding? (zeros)

thewh1teagle commented 2 months ago

Hm, I don't know if it's just me, but the latest version couldn't transcript any files I use, even though I tried to tinker with the setting the second time I tried it again. I usually use Vibe to transcribe my podcast to Indonesian, yet tonight, both attempts (3 files, 2 times), doesn't produce the result that I was expecting. It's so bad, and I have no idea why.

Thanks for letting know. Please upload to Google drive the audio file and the transcription so we can understand and potentially solve this issue. I'm transcribing non English as well on latest version. Works fine on my end.

altunenes commented 2 months ago

As a result of my experiments, I now have a better understanding of the behavior of these models. The beginning of the audio files is so critical that if something like a different noise is introduced at the beginning of the video, the “identification” drops significantly. For example, when I upload an audio file where two people is speaking with a small echo (indistinct due to human voice) at the beginning, the accuracy of the identification drops to around 50% (transcription is still fine). However, when I try to trim this part (removing the initial 1 second), the accuracy reaches up to 90%.

Interesting. You're talking about the diarization models right? (the segmentation and embedding) If so, maybe we can always add 1s of silent padding? (zeros)

let me make some additional experiments to make sure, its indeed about the diarization models. this was just a couple of examples, and wanted to share my results :-) I need more specific audios. But apart from that, isn't it always good to use padding for general usage?

thewh1teagle commented 2 months ago

לאחר שאני בוחר קובץ לתמלול, ומפעיל התוכנה קורסת ללא שום הודעת באג!

אפשר להריץ את התוכנה דרך הcmd.exe ולהפעיל את הלוגים ואז נוכל לראות למה היא קורסת

https://github.com/thewh1teagle/vibe/blob/main/DEBUG.md

בנוסף שיפרתי את הלוגים בגרסת הבטא

https://github.com/thewh1teagle/vibe/releases/tag/v2.5.1-beta.0

והעלתי שני גרסאות שונות למעבדים ישנים: הראשונה older-cpu היא אמורה לעבוד על רוב המחשבים שקיימים השניה older-cpu-vulkan שמהירה יותר משתמשת בGPU אבל לא תעבוד על כל מחשב

thewh1teagle commented 2 months ago

But apart from that, isn't it always good to use padding for general usage?

I think that it can make the first timestamp incorrect. I don't think that it's gonna improve it that much. We use pyannote segmentation so it should non start from 0 if there's silence anyway

Danthig commented 2 months ago

אפשר להריץ את התוכנה דרך הcmd.exe ולהפעיל את הלוגים ואז נוכל לראות למה היא קורסת

2024-08-26T18:52:49.962287Z DEBUG vibe::setup: Vibe App Running 2024-08-26T18:52:49.963132Z DEBUG vibe::setup: webview version: 126.0.2592.113 2024-08-26T18:52:49.963401Z DEBUG vibe::custom_protocol: Protocol handler registered successfully. 2024-08-26T18:52:49.963533Z DEBUG vibe::setup: Cargo features: vulkan 2024-08-26T18:52:49.963758Z DEBUG vibe::setup: CPU Features {"avx":{"enabled":true,"support":true},"avx2":{"enabled":true,"support":true},"f16c":{"enabled":true,"support":true},"fma":{"enabled":true,"support":true}} 2024-08-26T18:52:49.963940Z DEBUG vibe::setup: APP VERSION: 2.5.1-beta.0 2024-08-26T18:52:49.964123Z DEBUG vibe::setup: COMMIT HASH: 255ec8be8ee17692d272f1485e7b2ba224276d30 2024-08-26T18:52:49.964252Z DEBUG vibe::setup: Non CLI mode 2024-08-26T18:52:50.651141Z DEBUG vibe::cmd: None 2024-08-26T18:52:50.661231Z DEBUG vibe::cmd::audio: Default Input Device: Ok("Microphone Array (טכנולוגיית Intel® Smart Sound למיקרופונים דיגיטליים)") 2024-08-26T18:52:50.661553Z DEBUG vibe::cmd::audio: Default Output Device: Ok("Speaker (Realtek(R) Audio)") 2024-08-26T18:52:50.663536Z DEBUG vibe::cmd::audio: Devices: 2024-08-26T18:52:52.629091Z DEBUG vibe::cmd: None 2024-08-26T18:53:02.660910Z DEBUG vibe::cmd: loading model first time 2024-08-26T18:53:02.661216Z DEBUG vibe_core::transcribe: open model... 2024-08-26T18:53:02.661382Z DEBUG vibe_core::transcribe: gpu device: 0 2024-08-26T18:53:02.661446Z DEBUG vibe_core::transcribe: use gpu: true 2024-08-26T18:53:02.661568Z DEBUG vibe_core::transcribe: creating whisper context with model path C:\Users\1234\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin 2024-08-26T18:53:02.661817Z INFO whisper_rs::whisper_sys_tracing: whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\1234\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin' 2024-08-26T18:53:02.662212Z INFO whisper_rs::whisper_sys_tracing: whisper_init_with_params_no_state: use gpu = 1 2024-08-26T18:53:02.662363Z INFO whisper_rs::whisper_sys_tracing: whisper_init_with_params_no_state: flash attn = 0 2024-08-26T18:53:02.662488Z INFO whisper_rs::whisper_sys_tracing: whisper_init_with_params_no_state: gpu_device = 0 2024-08-26T18:53:02.662606Z INFO whisper_rs::whisper_sys_tracing: whisper_init_with_params_no_state: dtw = 0 2024-08-26T18:53:02.662750Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: loading model 2024-08-26T18:53:02.662911Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_vocab = 51865 2024-08-26T18:53:02.662993Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_audio_ctx = 1500 2024-08-26T18:53:02.663066Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_audio_state = 1024 2024-08-26T18:53:02.663187Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_audio_head = 16 2024-08-26T18:53:02.663292Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_audio_layer = 24 2024-08-26T18:53:02.663403Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_text_ctx = 448 2024-08-26T18:53:02.663555Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_text_state = 1024 2024-08-26T18:53:02.663674Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_text_head = 16 2024-08-26T18:53:02.663908Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_text_layer = 24 2024-08-26T18:53:02.664011Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_mels = 80 2024-08-26T18:53:02.664110Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: ftype = 1 2024-08-26T18:53:02.664218Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: qntvr = 0 2024-08-26T18:53:02.664408Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: type = 4 (medium) 2024-08-26T18:53:02.845908Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: adding 1608 extra tokens 2024-08-26T18:53:02.849950Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_langs = 99 ggml_vulkan: Found 1 Vulkan devices: Vulkan0: Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 2024-08-26T18:53:03.028527Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: Intel(R) Iris(R) Xe Graphics total size = 1533.14 MB 2024-08-26T18:53:05.447749Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: model size = 1533.14 MB 2024-08-26T18:53:05.453531Z DEBUG vibe_core::transcribe: created context successfuly 2024-08-26T18:53:05.457033Z DEBUG vibe_core::transcribe: Transcribe called with { "path": "C:\Users\1234\Desktop\samples_single.wav", "lang": "en", "verbose": false, "n_threads": 4, "init_prompt": "", "temperature": 0.4, "translate": null, "max_text_ctx": null, "word_timestamps": false, "max_sentence_len": 100 } 2024-08-26T18:53:05.459061Z DEBUG vibe_core::audio: ffmpeg path is C:\Users\1234\AppData\Local\vibe\ffmpeg.exe 2024-08-26T18:53:05.523005Z DEBUG vibe_core::transcribe: out path is C:\Users\1234\AppData\Local\Temp.tmp1EwhXs.wav 2024-08-26T18:53:05.523300Z DEBUG vibe_core::audio: wav reader read from "C:\Users\1234\AppData\Local\Temp\.tmp1EwhXs.wav" 2024-08-26T18:53:05.523594Z DEBUG vibe_core::audio: parsing C:\Users\1234\AppData\Local\Temp.tmp1EwhXs.wav 2024-08-26T18:53:05.528046Z INFO whisper_rs::whisper_sys_tracing: whisper_backend_init_gpu: using Vulkan backend 2024-08-26T18:53:05.563390Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: kv self size = 150.99 MB 2024-08-26T18:53:05.600476Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: kv cross size = 150.99 MB 2024-08-26T18:53:05.604020Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: kv pad size = 6.29 MB ggml_gallocr_needs_realloc: graph has different number of nodes ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0) ggml_gallocr_reserve_n: reallocating Intel(R) Iris(R) Xe Graphics buffer from size 0.00 MiB to 25.73 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB 2024-08-26T18:53:05.605921Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: compute buffer (conv) = 28.55 MB ggml_gallocr_needs_realloc: graph has different number of nodes ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0) ggml_gallocr_reserve_n: reallocating Intel(R) Iris(R) Xe Graphics buffer from size 0.00 MiB to 565.06 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB 2024-08-26T18:53:05.611910Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: compute buffer (encode) = 594.09 MB ggml_gallocr_needs_realloc: graph has different number of nodes ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0) ggml_gallocr_reserve_n: reallocating Intel(R) Iris(R) Xe Graphics buffer from size 0.00 MiB to 5.86 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB 2024-08-26T18:53:05.683214Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: compute buffer (cross) = 7.72 MB ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1) ggml_gallocr_reserve_n: reallocating Intel(R) Iris(R) Xe Graphics buffer from size 0.00 MiB to 133.88 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 2.63 MiB 2024-08-26T18:53:05.688240Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: compute buffer (decode) = 144.71 MB 2024-08-26T18:53:05.688481Z DEBUG vibe_core::transcribe: set language to Some("en") 2024-08-26T18:53:05.688642Z DEBUG vibe_core::transcribe: setting temperature to 0.4 2024-08-26T18:53:05.688768Z DEBUG vibe_core::transcribe: setting init prompt to 2024-08-26T18:53:05.688886Z DEBUG vibe_core::transcribe: setting n threads to 4 2024-08-26T18:53:05.689458Z DEBUG vibe_core::transcribe: set start time... 2024-08-26T18:53:05.689598Z DEBUG vibe_core::transcribe: setting state full... 2024-08-26T18:53:05.878391Z DEBUG vibe_core::transcribe: progress callback 0 2024-08-26T18:53:05.878757Z DEBUG vibe::cmd: set_progress_bar 0 2024-08-26T18:53:12.824830Z DEBUG vibe_core::transcribe: progress callback 100 2024-08-26T18:53:12.825332Z DEBUG vibe::cmd: set_progress_bar 100 2024-08-26T18:53:12.826618Z DEBUG vibe_core::transcribe: getting segments count... 2024-08-26T18:53:12.826907Z DEBUG vibe_core::transcribe: found 1 sentence segments 2024-08-26T18:53:12.827174Z DEBUG vibe_core::transcribe: looping segments... 2024-08-26T18:54:14.075808Z DEBUG vibe::cmd: None 2024-08-26T18:54:17.550558Z DEBUG vibe::gpu_preference: GPU preference removed successfully for the current executable (C:\Users\1234\AppData\Local\vibe\vibe.exe). 2024-08-26T18:54:26.814772Z DEBUG vibe::cmd: None 2024-08-26T18:54:30.007525Z DEBUG vibe::cmd: None 2024-08-26T18:54:36.472702Z DEBUG vibe::cmd: model path or gpu device changed. reloading 2024-08-26T18:54:36.473165Z DEBUG vibe_core::transcribe: open model... 2024-08-26T18:54:36.473569Z DEBUG vibe_core::transcribe: gpu device: 0 2024-08-26T18:54:36.473892Z DEBUG vibe_core::transcribe: use gpu: true 2024-08-26T18:54:36.474048Z DEBUG vibe_core::transcribe: creating whisper context with model path C:\Users\1234\AppData\Local\github.com.thewh1teagle.vibe\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin 2024-08-26T18:54:36.474268Z INFO whisper_rs::whisper_sys_tracing: whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\1234\AppData\Local\github.com.thewh1teagle.vibe\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin' 2024-08-26T18:54:36.475113Z INFO whisper_rs::whisper_sys_tracing: whisper_init_with_params_no_state: use gpu = 1 2024-08-26T18:54:36.475369Z INFO whisper_rs::whisper_sys_tracing: whisper_init_with_params_no_state: flash attn = 0 2024-08-26T18:54:36.475565Z INFO whisper_rs::whisper_sys_tracing: whisper_init_with_params_no_state: gpu_device = 0 2024-08-26T18:54:36.475749Z INFO whisper_rs::whisper_sys_tracing: whisper_init_with_params_no_state: dtw = 0 2024-08-26T18:54:36.475964Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: loading model 2024-08-26T18:54:36.476152Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_vocab = 51865 2024-08-26T18:54:36.476368Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_audio_ctx = 1500 2024-08-26T18:54:36.476522Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_audio_state = 1280 2024-08-26T18:54:36.476648Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_audio_head = 20 2024-08-26T18:54:36.476766Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_audio_layer = 32 2024-08-26T18:54:36.476877Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_text_ctx = 448 2024-08-26T18:54:36.476994Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_text_state = 1280 2024-08-26T18:54:36.477160Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_text_head = 20 2024-08-26T18:54:36.477350Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_text_layer = 32 2024-08-26T18:54:36.477530Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_mels = 80 2024-08-26T18:54:36.477765Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: ftype = 1 2024-08-26T18:54:36.477955Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: qntvr = 0 2024-08-26T18:54:36.478146Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: type = 5 (large) 2024-08-26T18:54:36.566583Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: adding 1607 extra tokens 2024-08-26T18:54:36.570661Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: n_langs = 99 2024-08-26T18:54:36.576142Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: Intel(R) Iris(R) Xe Graphics total size = 3093.99 MB 2024-08-26T18:54:46.901323Z INFO whisper_rs::whisper_sys_tracing: whisper_model_load: model size = 3093.99 MB 2024-08-26T18:54:46.925822Z DEBUG vibe_core::transcribe: created context successfuly 2024-08-26T18:54:47.400926Z DEBUG vibe_core::transcribe: Transcribe called with { "path": "C:\Users\1234\Desktop\samples_single.wav", "lang": "he", "verbose": false, "n_threads": 4, "init_prompt": "", "temperature": 0.4, "translate": null, "max_text_ctx": null, "word_timestamps": false, "max_sentence_len": 100 } 2024-08-26T18:54:47.404387Z DEBUG vibe_core::audio: ffmpeg path is C:\Users\1234\AppData\Local\vibe\ffmpeg.exe 2024-08-26T18:54:47.471936Z DEBUG vibe_core::transcribe: out path is C:\Users\1234\AppData\Local\Temp.tmpHwEmtu.wav 2024-08-26T18:54:47.472223Z DEBUG vibe_core::audio: wav reader read from "C:\Users\1234\AppData\Local\Temp\.tmpHwEmtu.wav" 2024-08-26T18:54:47.472608Z DEBUG vibe_core::audio: parsing C:\Users\1234\AppData\Local\Temp.tmpHwEmtu.wav 2024-08-26T18:54:47.474937Z INFO whisper_rs::whisper_sys_tracing: whisper_backend_init_gpu: using Vulkan backend 2024-08-26T18:54:47.531754Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: kv self size = 251.66 MB 2024-08-26T18:54:47.599489Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: kv cross size = 251.66 MB 2024-08-26T18:54:47.610910Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: kv pad size = 7.86 MB ggml_gallocr_needs_realloc: graph has different number of nodes ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0) ggml_gallocr_reserve_n: reallocating Intel(R) Iris(R) Xe Graphics buffer from size 0.00 MiB to 31.59 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB 2024-08-26T18:54:47.614714Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: compute buffer (conv) = 34.69 MB ggml_gallocr_needs_realloc: graph has different number of nodes ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0) ggml_gallocr_reserve_n: reallocating Intel(R) Iris(R) Xe Graphics buffer from size 0.00 MiB to 882.11 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB 2024-08-26T18:54:47.626700Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: compute buffer (encode) = 926.53 MB ggml_gallocr_needs_realloc: graph has different number of nodes ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0) ggml_gallocr_reserve_n: reallocating Intel(R) Iris(R) Xe Graphics buffer from size 0.00 MiB to 7.32 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB 2024-08-26T18:54:47.680958Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: compute buffer (cross) = 9.25 MB ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1) ggml_gallocr_reserve_n: reallocating Intel(R) Iris(R) Xe Graphics buffer from size 0.00 MiB to 201.69 MiB ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 2.63 MiB 2024-08-26T18:54:47.706290Z INFO whisper_rs::whisper_sys_tracing: whisper_init_state: compute buffer (decode) = 215.82 MB 2024-08-26T18:54:47.706524Z DEBUG vibe_core::transcribe: set language to Some("he") 2024-08-26T18:54:47.706673Z DEBUG vibe_core::transcribe: setting temperature to 0.4 2024-08-26T18:54:47.706794Z DEBUG vibe_core::transcribe: setting init prompt to 2024-08-26T18:54:47.706936Z DEBUG vibe_core::transcribe: setting n threads to 4 2024-08-26T18:54:47.707750Z DEBUG vibe_core::transcribe: set start time... 2024-08-26T18:54:47.707862Z DEBUG vibe_core::transcribe: setting state full... 2024-08-26T18:54:47.860686Z DEBUG vibe_core::transcribe: progress callback 0 2024-08-26T18:54:47.861042Z DEBUG vibe::cmd: set_progress_bar 0 2024-08-26T18:55:03.886449Z DEBUG vibe_core::transcribe: progress callback 272 2024-08-26T18:55:03.888674Z DEBUG vibe::cmd: set_progress_bar 272 2024-08-26T18:55:03.889100Z DEBUG vibe_core::transcribe: getting segments count... 2024-08-26T18:55:03.889746Z DEBUG vibe_core::transcribe: found 1 sentence segments 2024-08-26T18:55:03.889909Z DEBUG vibe_core::transcribe: looping segments...


> 
> https://github.com/thewh1teagle/vibe/blob/main/DEBUG.md
> 
> בנוסף שיפרתי את הלוגים בגרסת הבטא
> 
> https://github.com/thewh1teagle/vibe/releases/tag/v2.5.1-beta.0
> 
> והעלתי שני גרסאות שונות למעבדים ישנים: הראשונה older-cpu היא אמורה לעבוד על רוב המחשבים שקיימים השניה older-cpu-vulkan שמהירה יותר משתמשת בGPU אבל לא תעבוד על כל מחשב

older-cpu הראה כאילו הוא מתמלל הרבה מאוד זמן, ואפילו שהתשמש בCPU ולא רק בGPU, אבל אפילו את הקובץ דוגמא samples_single הוא לא תמלל!

older-cpu-vulkan השתמש רק בGPU וכאשר רציתי שישתמש גם בCPU, התוכנה קרסה (תוכל לראות את הלוגים למעלה). 
והלוגים הם על ידי ההפעלה דרך הCMD, גרסת הבטא לא הצליחה לזהות משהו. 
@thewh1teagle האם לפתוח אישיו חדש?
תודה רבה!!!
thewh1teagle commented 2 months ago

@Danthig מעולה אז נראה שזה עובד עם הCPU. לצערי כרגע אין פתרון לזה על מחשבים מסוימים חייבים להתקין את הגרסה של הCPU וזה לא יעבוד עם Vulkan. עם זאת על רוב המחשבים החדשים זה יעבוד עם Vulkan - הרבה יותר מהר.