songlairui commented 7 months ago

Device:

Macbook Pro 14 m1 pro 16G RAM

Input:

408s wav file language: zh (unable use distil model)

Test

whisper.cpp

models/ggml-large.bin (default large is large-v3) -l zh Result: 100.37 s

lightning-whisper-mlx

model="large-v3", batch_size=4, quant="4bit" ( large batch_size will freeze my device)

Result: 58.83 s

Here is the RAM payload

Conclusion

Below LLM Generated

Based on the test results, the lightning-whisper-mlx model with "large-v3" configuration demonstrates a clear advantage in processing speed, completing the task in approximately half the time compared to whisper.cpp. However, this speed comes at the cost of higher RAM usage, which could be a limiting factor for systems with less available memory. Therefore, for users with constrained RAM but sufficient processing power, whisper.cpp may be a more viable option. Conversely, for those with ample RAM and a need for faster processing times, lightning-whisper-mlx is the recommended choice.

mustafaaljadery commented 7 months ago

Thank you for the review. Yep, lighting-whisper-mlx is batching multiple chunks of audio at once. The more memory you have, the faster it will go!

libratiger commented 7 months ago