mustafaaljadery / lightning-whisper-mlx

An extremely fast implementation of whisper optimized for Apple Silicon using MLX.
https://mustafaaljadery.github.io/lightning-whisper-mlx/
587 stars 30 forks source link

Simple Performance Report (language-zh, large-v3) #2

Open songlairui opened 7 months ago

songlairui commented 7 months ago

Device:

Macbook Pro 14 m1 pro 16G RAM

Input:

408s wav file language: zh (unable use distil model)

Test

whisper.cpp

models/ggml-large.bin (default large is large-v3) -l zh Result: 100.37 s

image

lightning-whisper-mlx

model="large-v3", batch_size=4, quant="4bit" ( large batch_size will freeze my device)

Result: 58.83 s

Here is the RAM payload

image

Conclusion

Below LLM Generated

Based on the test results, the lightning-whisper-mlx model with "large-v3" configuration demonstrates a clear advantage in processing speed, completing the task in approximately half the time compared to whisper.cpp. However, this speed comes at the cost of higher RAM usage, which could be a limiting factor for systems with less available memory. Therefore, for users with constrained RAM but sufficient processing power, whisper.cpp may be a more viable option. Conversely, for those with ample RAM and a need for faster processing times, lightning-whisper-mlx is the recommended choice.

mustafaaljadery commented 7 months ago

Thank you for the review. Yep, lighting-whisper-mlx is batching multiple chunks of audio at once. The more memory you have, the faster it will go!

libratiger commented 7 months ago

Thank you for the review. Yep, lighting-whisper-mlx is batching multiple chunks of audio at once. The more memory you have, the faster it will go!

is there any relationship with Core ML? I ever test the whisper.cpp with Core ML, it consume more RAM than the simple whisper.cpp