thewh1teagle / vibe

Transcribe on your own!
https://thewh1teagle.github.io/vibe/
MIT License
492 stars 32 forks source link

Bug: failed to create whisper context #42

Closed NHLOCAL closed 2 months ago

NHLOCAL commented 3 months ago

What happened?

A bug happened!

Steps to reproduce

  1. I added a file
  2. I clicked on "Transcribe"
  3. In practice it threw an error right at the start

What OS are you seeing the problem on?

Window

Relevant log output

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to open model\n\nCaused by:\n    Failed to create a new whisper context."
options: ModelArgs { path: "C:\\Users\\משתמש\\Music\\שירי יוסי גרין\\song_04.mp3", model: "C:\\Users\\משתמש\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin", lang: Some("auto"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.8
Arch: x86_64
Platform: windows
Kernel Version: 10.0.26100
OS: windows
OS Version: 10.0.26100
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\משתמש\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"
thewh1teagle commented 3 months ago

Thanks for reporting! Can you provide me the URL which you downloaded the model from? Also if possible check the hash of the file:

  1. open Powershell from the search bar
  2. Type
    Get-FileHash "C:\Users\משתמש\AppData\Local\github.com.thewh1teagle.vibe\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"
  3. Press enter

Then paste here the command with the hash result, so I can validate that the model file is not damaged

NHLOCAL commented 3 months ago

These are the results from Powershell SHA256 A40C1FCBB91BD7EFAC5AA9054089BB52CB4F62C5E71C12298F77FD9A03D07387

I downloaded the file from here, it is a whisper model adjusted for the Hebrew language: https://huggingface.co/ivrit-ai/whisper-large-v2-tuned

thewh1teagle commented 3 months ago

These are the results from Powershell SHA256 A40C1FCBB91BD7EFAC5AA9054089BB52CB4F62C5E71C12298F77FD9A03D07387

I downloaded the file from here, it is a whisper model adjusted for the Hebrew language: https://huggingface.co/ivrit-ai/whisper-large-v2-tuned

The hash is correct, so this is the right model and it's not corrupt. I also tried to transcribe using the exact model file on both Windows and macOS, and it worked. I suspect the issue might be specific to your PC. Do you have antivirus? if so, you can try to exclude the models directory and maybe the vibe software too, and then try again.

NHLOCAL commented 3 months ago

I only have the built-in Windows antivirus

What do you mean by "exclude the models directory"?

thewh1teagle commented 3 months ago

I'm not sure why it's crashing, but I'm pretty sure it's something which happens in whisper.cpp directly, so your can try running it directly on some wav file:

  1. Download https://whisper-bin-x64.zip from whisper.cpp/releases and unzip it
  2. Download vibe/samples/single_speaker.wav and place it in the same folder
  3. Open terminal there and execute
    main.exe -f single_speaker.wav

    Check if it works, compare the result by trying with vibe app.

NHLOCAL commented 3 months ago

This is the result:

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin' whisper_init_from_file_with_params_no_state: failed to open 'models/ggml-base.en.bin' error: failed to initialize whisper context

The name of my user folder is in Hebrew ("C:\Users\משתמש"), could it be related?

thewh1teagle commented 3 months ago

This is the result:

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin' whisper_init_from_file_with_params_no_state: failed to open 'models/ggml-base.en.bin' error: failed to initialize whisper context

The name of my user folder is in Hebrew ("C:\Users\משתמש"), could it be related?

I forgot the model flag which required to test it

I'm not sure why it's crashing, but I'm pretty sure it's something which happens in whisper.cpp directly, so your can try running it directly on some wav file:

  1. Download https://whisper-bin-x64.zip from whisper.cpp/releases and unzip it
  2. Download vibe/samples/single_speaker.wav and place it in the same folder
  3. Open terminal there and execute
    main.exe -m "<path to model>" -f single_speaker.wav

    Replace with the real path of the model (you can drag and drop it to the terminal) Check if it works, compare the result by trying with vibe app.

thewh1teagle commented 3 months ago

Update: It turns out that whisper.cpp fails to load the model if the path contains Hebrew characters! So weird! I'll report it and hope it gets fixed soon.

NHLOCAL commented 3 months ago

I tried now according to your latest instructions (adding the model path) and it manages to load the model great. Despite the Hebrew letters in the model path! (It does display the Hebrew characters in a distorted manner, but actually perceives them correctly)

Here is the relevant I/O:

input:

main.exe -m "C:\Users\משתמש\AppData\Local\github.com.thewh1teagle.vibe\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin" -f again1.wav

output:

whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\ε∙·ε∙\AppData\Local\github.com.thewh1teagle.vibe\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU buffer size =  3094.49 MB
whisper_model_load: model size    = 3093.99 MB
whisper_init_state: kv self size  =  220.20 MB
whisper_init_state: kv cross size =  245.76 MB
whisper_init_state: compute buffer (conv)   =   30.98 MB
whisper_init_state: compute buffer (encode) =  212.42 MB
whisper_init_state: compute buffer (cross)  =    9.38 MB
whisper_init_state: compute buffer (decode) =   99.23 MB

note: Probably the problem exists only in the adapted code of vibe, not in the original with cpp

SBW88 commented 3 months ago

I just checked the new version But it still doesn't work if the user's name is in Hebrew [I already tested the previous version on the same computer with an English username [and also the audio file name] works great.

thewh1teagle commented 2 months ago

Fixed in latest version (1.0.1) You can update through the settings in vibe or from https://thewh1teagle.github.io/vibe/