Closed philippjbauer closed 1 year ago
Hello @philippjbauer ,
Thanks for the interest in whisper.net and also for reporting the issue.
It seems that I made a mistake when I built the latest version and didn't include osx build: https://github.com/sandrohanea/whisper.net/tree/main/Whisper.net.Runtime
All are 2 weeks old, except osx-x64 and osx-arm64 which are 2 months old.
I plan to release a new version tomorrow (to also include latest changes from whisper.cpp) which will have the correct osx version.
Until then, you can give it a try using CoreML example (which is optimised for MacOs and have latest library build).
Thank you! I was able to get it to work on macOS with the CoreML package.
I have tried to use the CoreML library I can download with the project's downloader class but it can't load it.
Downloading Model ggml-tiny.en.bin ... 74.1 MB downloadedd
done
Downloading Model ggml-tiny.en-encoder.mlmodelc ... 14.34 MB downloaded
done
Processing test-video.mp4 -> test-video.mp3 ...
Processing test-video.mp3 -> test-video.wav ...
whisper_init_from_file_no_state: loading model from '/Users/philippbauer/Work/Projects/Transcriber/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1
whisper_model_load: mem required = 201.00 MB (+ 3.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 73.62 MB
whisper_model_load: model size = 73.54 MB
whisper_init_state: kv self size = 2.62 MB
whisper_init_state: kv cross size = 8.79 MB
whisper_init_state: loading Core ML model from '/Users/philippbauer/Work/Projects/Transcriber/ggml-tiny.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: failed to load Core ML model from '/Users/philippbauer/Work/Projects/Transcriber/ggml-tiny.en-encoder.mlmodelc'
It is still transcribing the audio I give it though. Any idea why that might be?
For your case, it sounds like you didn't download the mlmodelc encoder. You can see in the example the https://github.com/sandrohanea/whisper.net/blob/441433d590e974ed04b85d5aab49bb38032874d8/examples/CoreML/Program.cs#L30C8-L40C10 which is downloading that model.
Without that part, the transcribing will run slower.
However, The example will be also updated so it will make use of WhisperGgmlDownloader.GetEncoderCoreMLModelAsync
Released 1.4.6 which contains the fix for the osx build.
I can confirm that the mlmodelc file is in the same directory as the GGML model at the path indicated above but can't be loaded according to the output.
I'm calling the factory with the full path to the model like so, as indicated by the docs:
using var whisperFactory = WhisperFactory.FromPath(Path.GetFullPath(modelName));
I can also confirm that the transcription works as expected with the runtime version 1.4.6 without the use of the CoreML package!
I figured out that I assumed the encoder file was ready to use and not a zip file that needed to be decompressed into a folder with the .mlmodelc extensions in the folder name. Now I got it to work as expected.
Maybe the documentation can be a bit clearer on this point.
Thank you!
Hi y'all, thank you for working on this .NET implementation for Whisper!
I'm trying to run the "Simple" example from the repo but run into issues on macOS Ventura (ARM, M1 Pro). It appears to find the native library but can't call it correctly.
Older Whisper.net versions (1.4.4, 1.4.3) are exhibiting the same behavior.