stlukey / whispercpp.py

Python bindings for whisper.cpp
MIT License
201 stars 97 forks source link

CoreML support? #18

Open ArtemBernatskyy opened 1 year ago

ArtemBernatskyy commented 1 year ago

How can we add CoreML support? Thx!

stlukey commented 1 year ago

Whisper.cpp now has CoreML support:

https://github.com/ggerganov/whisper.cpp/pull/566

Using just with whisper.cpp, should be as simple as compiling with the appropriate flags:

cd build
cmake -DWHISPER_COREML=1 ..

Check by running:

./main -m models/ggml-base.en.bin -f samples/gb0.wav

...

whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | COREML = 1 | 

...

note: COREML = 1.

For whispercpp.py, we can add the flag for CoreML here inside setup.py:

if sys.platform == 'darwin':
    os.environ['CFLAGS']   = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=gnu11'
    os.environ['CXXFLAGS'] = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=c++11'
    os.environ['LDFLAGS']  = '-framework Accelerate'

First update the submodule inside whispercpp.py for whisper.cpp. Check that it still runs, it might need some changes if the API has changed. Given it still works, add the flag inside setup.py.

I can't test this at the moment, but feel free to make the pull request, and we can get this feature added.

ArtemBernatskyy commented 1 year ago

Thx! I decided to use OpenAI's Whisper API for the current moment, in my tests it beats the local Whisper with CoreML by 3-4 times (comparing to Macbook M1 32GB)

diaojunxian commented 11 months ago

Whisper.cpp now has CoreML support:

ggerganov/whisper.cpp#566

Using just with whisper.cpp, should be as simple as compiling with the appropriate flags:

cd build
cmake -DWHISPER_COREML=1 ..

Check by running:

./main -m models/ggml-base.en.bin -f samples/gb0.wav

...

whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | COREML = 1 | 

...

note: COREML = 1.

For whispercpp.py, we can add the flag for CoreML here inside setup.py:

if sys.platform == 'darwin':
    os.environ['CFLAGS']   = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=gnu11'
    os.environ['CXXFLAGS'] = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=c++11'
    os.environ['LDFLAGS']  = '-framework Accelerate'

First update the submodule inside whispercpp.py for whisper.cpp. Check that it still runs, it might need some changes if the API has changed. Given it still works, add the flag inside setup.py.

I can't test this at the moment, but feel free to make the pull request, and we can get this feature added. @stlukey

@stlukey

I have verified that my computer is an M2. I found that CoreML does not seem to be enabled through this command.

I also added this compile flag, which also does not seem to work:

if sys.platform == 'darwin':
    print("run here.....")
    os.environ['CFLAGS'] = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=gnu11'
    os.environ['CXXFLAGS'] = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=c++11'
    os.environ['LDFLAGS'] = '-framework Accelerate'

That is, I added -DWHISPER_COREML=1 and used the latest whisper.cpp code. When I run the generated whisper.xxxx.so file to transcribe the same audio, it takes 12 minutes. But if I compile whisper.cpp with the same commit using cmake -DWHISPER_COREML=1 and run ./main on the same audio, it only takes 7 minutes. Also, I can see the loading process has:

whisper_init_state: loading Core ML model from 'models/ggml-large-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
system_info: n_threads = 4 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | COREML = 1 | OPENVINO = 0 |

It loading the Core ML model. andIt only takes 8 minutes. I expected the .so call can also do this. How can I modify it?