roymacdonald / ofxWhisper

openFrameworks implementation of whisper.cpp
10 stars 3 forks source link

download the coreML model problem #5

Open stephanschulz opened 1 month ago

stephanschulz commented 1 month ago

I am trying to download the coreML model but seems like huggingfaces might not have it?

(base) stephanschulz@Stephans-Komputer ofxWhisper % sh libs/whisper_cpp/models/download-coreml-model.sh medium.en Downloading Core ML model medium.en from 'https://huggingface.co/datasets/ggerganov/whisper.cpp-coreml' ... Failed to download Core ML model medium.en Please try again later or download the original Whisper model files and convert them yourself.

do you think i can just directly download it from here: https://huggingface.co/ggerganov/whisper.cpp/tree/90a64d80ea254cf67575b41a5971f972c79f7b45 but the file name is a bit different ggml-base-encoder.mlmodelc

roymacdonald commented 1 month ago

ahh. I think you might need to build it yourself. the coreml model is the one optimized for Apple's processors. But there are a bunch of scripts there that allow you to do so. The big difference with the "normal" models is that it runs faster. But if you just want to check if the model yields better results just try using the normal model. Check here on how to do such. https://github.com/ggerganov/whisper.cpp/?tab=readme-ov-file#core-ml-support

Also, whisper.cpp is under very active development, so probably there are some improvements over the one I have packed in this addon. Maybe I shouldn't pack it and just have a gitsubmodule and some setup/update script.

stephanschulz commented 1 month ago

mh. i tried is quickly but did not succeed.

(base) stephanschulz@Stephans-Komputer whisper.cpp-master % ./models/generate-coreml-model.sh base.en ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=512, n_audio_head=8, n_audio_layer=6, n_vocab=51864, n_text_ctx=448, n_text_state=512, n_text_head=8, n_text_layer=6) /Applications/of_v0.12.0_osx_release/addons/ofxWhisper/libs/whisper.cpp-master/models/convert-whisper-to-coreml.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert x.shape[1:] == self.positional_embedding.shape[::-1], "incorrect audio shape" /opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/ane_transformers/reference/layer_norm.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert inputs.size(1) == self.num_channels /Applications/of_v0.12.0_osx_release/addons/ofxWhisper/libs/whisper.cpp-master/models/convert-whisper-to-coreml.py:77: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). dim_per_head = dim // self.n_head /Applications/of_v0.12.0_osx_release/addons/ofxWhisper/libs/whisper.cpp-master/models/convert-whisper-to-coreml.py:79: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! scale = float(dim_per_head)**-0.5 Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 821/822 [00:00<00:00, 8618.89 ops/s] Running MIL frontend_pytorch pipeline: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 119.94 passes/s] Running MIL default pipeline: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 78/78 [00:01<00:00, 52.04 passes/s] Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 175.08 passes/s] done converting 2024-07-17 14:22:41.519 xcodebuild[43791:13388926] Requested but did not find extension point with identifier Xcode.IDEKit.ExtensionSentinelHostApplications for extension Xcode.DebuggerFoundation.AppExtensionHosts.watchOS of plug-in com.apple.dt.IDEWatchSupportCore 2024-07-17 14:22:41.520 xcodebuild[43791:13388926] Requested but did not find extension point with identifier Xcode.IDEKit.ExtensionPointIdentifierToBundleIdentifier for extension Xcode.DebuggerFoundation.AppExtensionToBundleIdentifierMap.watchOS of plug-in com.apple.dt.IDEWatchSupportCore coremlc: error: compiler error: Encountered an error while compiling a neural network model: in operation op_20: The operator (const) referenced by this operation does not match the operator defined by expected opset ios15. Is it an operator of the same name described by a different opset? mv: rename models/coreml-encoder-base.en.mlmodelc to models/ggml-base.en-encoder.mlmodelc: No such file or directory

roymacdonald commented 1 month ago

no idea. probably it is better if you post this in the whisper-cpp issues