mybigday / whisper.rn

React Native binding of whisper.cpp.
MIT License
377 stars 23 forks source link

Option to use CoreML only on iOS without CPU fallback #93

Open jobpaardekooper opened 1 year ago

jobpaardekooper commented 1 year ago

Is there a reason why we can not exclusively use CoreML on iOS for example? Because now we will still need to bundle a regular CPU model but it is unclear if it would be used.

The documentation states that the library might fallback to using the CPU when you try to use CoreML. It would be nice if the docs also included a reason on why this would happen. When would it fall back to CPU mode? Or does that only happen on android? It is not really clear to me from the current documentation.

Thanks for the great work on this library!

UchennaOkafor commented 1 year ago

I agree with this, I noticed the CoreML .modelrc files are smaller in comparison to the ggml .bin files. So if we only use the ColeML files on iOS then that would be easier because we could bundle a smaller app size.

jhen0409 commented 1 year ago

At this time the ggml model still in use as decoder. This is not ideal as it will take up more memory space, we can see how we can improve this.

The documentation states that the library might fallback to using the CPU when you try to use CoreML. It would be nice if the docs also included a reason on why this would happen.

By default we use WHISPER_COREML_ALLOW_FALLBACK compiler flag, so it will fallback to CPU if the Core ML model load failed (see this code).

For easier debug, we may want to have a field like usedCoreML in the Context instance.

jobpaardekooper commented 1 year ago

What does PR #123 add? I am not very familiar with all this ML stuff. I thought it might have something to do with this but now I think maybe not.

Is it just a different (faster way) to allocate memory on apple devices so the context init will be faster? Of is it something different? Sorry for not understanding but I want to learn.

jhen0409 commented 1 year ago

What does PR #123 add? I am not very familiar with all this ML stuff. I thought it might have something to do with this but now I think maybe not.

Is it just a different (faster way) to allocate memory on apple devices so the context init will be faster? Of is it something different? Sorry for not understanding but I want to learn.

ggml-alloc basically reduce memory usage of model compared to before.

ggml-metal allow GGML to access GPU resources on Apple devices. If you use it, you don't need to load the CoreML model separately and you can get similar performance, but this depends on the performance difference of the GPU / Neural Engine on the device. The results on Mac / iPhone will be different. Currently it's not enabled yet as I mentioned in https://github.com/mybigday/whisper.rn/pull/123#issuecomment-1745925698.

jobpaardekooper commented 1 year ago

Thanks for explaining! Do you know any good or interesting resources to read and learn more about this stuff?

jhen0409 commented 1 year ago

Thanks for explaining! Do you know any good or interesting resources to read and learn more about this stuff?

Do you mean ML? I'm not an expert so I afraid I can't provide helpful resources.

If you want to learning things about GGML, I would recommend you watching ggml / llama.cpp / whisper.cpp and community project that using GGML. Especially llama.cpp, most things happen in this repo.