tazz4843 / whisper-rs

Rust bindings to https://github.com/ggerganov/whisper.cpp
The Unlicense
607 stars 105 forks source link

Feature/fix metal #148

Closed hlhr202 closed 3 weeks ago

hlhr202 commented 1 month ago

This is a naive fix for metal inference and log trampoline

hlhr202 commented 1 month ago

@tazz4843 reformatted code

tazz4843 commented 1 month ago

See review comments

hlhr202 commented 1 month ago

@tazz4843 Hey I have found a new possible way here.

whisper.cpp has now supported WHISPER_METAL_EMBED_LIBRARY build options that enable us to embed metal lib string into the build output. But we need to upgrade the whisper.cpp branch to a newer version. See the CMakeLists in whisper.cpp here https://github.com/ggerganov/whisper.cpp/blob/08981d1bacbe494ff1c943af6c577c669a2d9f4d/CMakeLists.txt#L78C12-L78C39 https://github.com/ggerganov/whisper.cpp/issues/2110

maybe we should consider finalize https://github.com/tazz4843/whisper-rs/pull/142 first then I can start implement a new build option here?

tazz4843 commented 1 month ago

142 has been merged, took me super long, sorry about that. Embedding the library is a much better idea imo and we should favour that.

thewh1teagle commented 1 month ago

I tried it using WHISPER_METAL_EMBED_LIBRARY=ON. it works, I can see in the logs that metal framework loaded (and without it doesn't) and it works much faster. I believe we should enable WHISPER_METAL_EMBED_LIBRARY by default if metal feature enabled.

hlhr202 commented 1 month ago

updated. please help check the new build config. also I v updated the metal log callback setup function. @tazz4843

tazz4843 commented 4 weeks ago

I don't have macOS to test on, will wait for a positive test from someone with macOS before merging

hlhr202 commented 4 weeks ago

I don't have macOS to test on, will wait for a positive test from someone with macOS before merging

I have self-tested it since I m using it for a private project. But its okay if one more tester passed it.

uohzxela commented 3 weeks ago

I have tested @hlhr202's latest changes by adding whisper-rs = { git = "https://github.com/hlhr202/whisper-rs.git", branch = "feature/fix-metal", features = ["metal"] } to my Cargo.toml and it works on my M1 Pro.

whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3 (small)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using Metal backend
whisper_model_load:    Metal total size =   487.00 MB
whisper_model_load: model size    =  487.00 MB
whisper_backend_init: using Metal backend
whisper_init_state: kv self size  =   56.62 MB
whisper_init_state: kv cross size =   56.62 MB
whisper_init_state: kv pad  size  =    4.72 MB
whisper_init_state: compute buffer (conv)   =   22.54 MB
whisper_init_state: compute buffer (encode) =  284.81 MB
whisper_init_state: compute buffer (cross)  =    6.31 MB
whisper_init_state: compute buffer (decode) =   97.40 MB

Would be great if we can merge this, and thanks @hlhr202 for your fix!