sandrohanea / whisper.net

Whisper.net. Speech to text made simple using Whisper Models
MIT License
547 stars 84 forks source link

Unable to find an entry point named 'whisper_full_default_params_by_ref' in shared library 'whisper'. #88

Closed philippjbauer closed 1 year ago

philippjbauer commented 1 year ago

Hi y'all, thank you for working on this .NET implementation for Whisper!

I'm trying to run the "Simple" example from the repo but run into issues on macOS Ventura (ARM, M1 Pro). It appears to find the native library but can't call it correctly.

Older Whisper.net versions (1.4.4, 1.4.3) are exhibiting the same behavior.

whisper.net/examples/Simple on  main [✘] via .NET 7.0.101 
➜ dotnet run --framework net6.0
Downloading Model ggml-base.bin
whisper_init_from_file_no_state: loading model from 'ggml-base.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2
whisper_model_load: mem required  =  310.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
Unhandled exception. System.EntryPointNotFoundException: Unable to find an entry point named 'whisper_full_default_params_by_ref' in shared library 'whisper'.
   at Whisper.net.Native.NativeMethods.whisper_full_default_params_by_ref(WhisperSamplingStrategy strategy)
   at Whisper.net.WhisperProcessor.GetWhisperParams()
   at Whisper.net.WhisperProcessor..ctor(WhisperProcessorOptions options)
   at Whisper.net.WhisperProcessorBuilder.Build()
   at Program.Main(String[] args) in /Users/philippbauer/Learning/whisper.net/examples/Simple/Program.cs:line 29
   at Program.<Main>(String[] args)
sandrohanea commented 1 year ago

Hello @philippjbauer ,

Thanks for the interest in whisper.net and also for reporting the issue.

It seems that I made a mistake when I built the latest version and didn't include osx build: https://github.com/sandrohanea/whisper.net/tree/main/Whisper.net.Runtime

All are 2 weeks old, except osx-x64 and osx-arm64 which are 2 months old.

I plan to release a new version tomorrow (to also include latest changes from whisper.cpp) which will have the correct osx version.

Until then, you can give it a try using CoreML example (which is optimised for MacOs and have latest library build).

philippjbauer commented 1 year ago

Thank you! I was able to get it to work on macOS with the CoreML package.

I have tried to use the CoreML library I can download with the project's downloader class but it can't load it.

Downloading Model ggml-tiny.en.bin ... 74.1 MB downloadedd
done
Downloading Model ggml-tiny.en-encoder.mlmodelc ... 14.34 MB downloaded
done
Processing test-video.mp4 -> test-video.mp3 ...
Processing test-video.mp3 -> test-video.wav ...
whisper_init_from_file_no_state: loading model from '/Users/philippbauer/Work/Projects/Transcriber/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1
whisper_model_load: mem required  =  201.00 MB (+    3.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.62 MB
whisper_model_load: model size    =   73.54 MB
whisper_init_state: kv self size  =    2.62 MB
whisper_init_state: kv cross size =    8.79 MB
whisper_init_state: loading Core ML model from '/Users/philippbauer/Work/Projects/Transcriber/ggml-tiny.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: failed to load Core ML model from '/Users/philippbauer/Work/Projects/Transcriber/ggml-tiny.en-encoder.mlmodelc'

It is still transcribing the audio I give it though. Any idea why that might be?

sandrohanea commented 1 year ago

For your case, it sounds like you didn't download the mlmodelc encoder. You can see in the example the https://github.com/sandrohanea/whisper.net/blob/441433d590e974ed04b85d5aab49bb38032874d8/examples/CoreML/Program.cs#L30C8-L40C10 which is downloading that model.

Without that part, the transcribing will run slower.

However, The example will be also updated so it will make use of WhisperGgmlDownloader.GetEncoderCoreMLModelAsync

sandrohanea commented 1 year ago

Released 1.4.6 which contains the fix for the osx build.

philippjbauer commented 1 year ago

I can confirm that the mlmodelc file is in the same directory as the GGML model at the path indicated above but can't be loaded according to the output.

I'm calling the factory with the full path to the model like so, as indicated by the docs: using var whisperFactory = WhisperFactory.FromPath(Path.GetFullPath(modelName));

philippjbauer commented 1 year ago

I can also confirm that the transcription works as expected with the runtime version 1.4.6 without the use of the CoreML package!

philippjbauer commented 1 year ago

I figured out that I assumed the encoder file was ready to use and not a zip file that needed to be decompressed into a folder with the .mlmodelc extensions in the folder name. Now I got it to work as expected.

Maybe the documentation can be a bit clearer on this point.

Thank you!