Closed hlhr202 closed 6 months ago
@tazz4843 reformatted code
See review comments
@tazz4843 Hey I have found a new possible way here.
whisper.cpp has now supported WHISPER_METAL_EMBED_LIBRARY
build options that enable us to embed metal lib string into the build output. But we need to upgrade the whisper.cpp branch to a newer version.
See the CMakeLists in whisper.cpp here
https://github.com/ggerganov/whisper.cpp/blob/08981d1bacbe494ff1c943af6c577c669a2d9f4d/CMakeLists.txt#L78C12-L78C39
https://github.com/ggerganov/whisper.cpp/issues/2110
maybe we should consider finalize https://github.com/tazz4843/whisper-rs/pull/142 first then I can start implement a new build option here?
I tried it using WHISPER_METAL_EMBED_LIBRARY=ON
. it works, I can see in the logs that metal framework loaded (and without it doesn't)
and it works much faster.
I believe we should enable WHISPER_METAL_EMBED_LIBRARY
by default if metal
feature enabled.
updated. please help check the new build config. also I v updated the metal log callback setup function. @tazz4843
I don't have macOS to test on, will wait for a positive test from someone with macOS before merging
I don't have macOS to test on, will wait for a positive test from someone with macOS before merging
I have self-tested it since I m using it for a private project. But its okay if one more tester passed it.
I have tested @hlhr202's latest changes by adding whisper-rs = { git = "https://github.com/hlhr202/whisper-rs.git", branch = "feature/fix-metal", features = ["metal"] }
to my Cargo.toml and it works on my M1 Pro.
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 768
whisper_model_load: n_text_head = 12
whisper_model_load: n_text_layer = 12
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 3 (small)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_backend_init: using Metal backend
whisper_model_load: Metal total size = 487.00 MB
whisper_model_load: model size = 487.00 MB
whisper_backend_init: using Metal backend
whisper_init_state: kv self size = 56.62 MB
whisper_init_state: kv cross size = 56.62 MB
whisper_init_state: kv pad size = 4.72 MB
whisper_init_state: compute buffer (conv) = 22.54 MB
whisper_init_state: compute buffer (encode) = 284.81 MB
whisper_init_state: compute buffer (cross) = 6.31 MB
whisper_init_state: compute buffer (decode) = 97.40 MB
Would be great if we can merge this, and thanks @hlhr202 for your fix!
This is a naive fix for metal inference and log trampoline