Using the original ggml-small.en.bin on an M1 mac, running whisper.cpp on the hp0.wav sample gives me these timings:
whisper_print_timings: load time = 469.52 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 144.11 ms
whisper_print_timings: sample time = 2631.84 ms / 3090 runs ( 0.85 ms per run)
whisper_print_timings: encode time = 7104.00 ms / 13 runs ( 546.46 ms per run)
whisper_print_timings: decode time = 330.89 ms / 25 runs ( 13.24 ms per run)
whisper_print_timings: batchd time = 9753.98 ms / 2982 runs ( 3.27 ms per run)
whisper_print_timings: prompt time = 944.44 ms / 2071 runs ( 0.46 ms per run)
whisper_print_timings: total time = 21415.03 ms
ggml_metal_free: deallocating
ggml_metal_free: deallocating
./main -m models/ggml-small.en.bin -f samples/hp0.wav 7.33s user 0.96s system 38% cpu 21.465 total
Running with ggml-distil-small.en.bin gives the following:
whisper_print_timings: load time = 333.88 ms
whisper_print_timings: fallbacks = 13 p / 5 h
whisper_print_timings: mel time = 155.13 ms
whisper_print_timings: sample time = 7036.76 ms / 9308 runs ( 0.76 ms per run)
whisper_print_timings: encode time = 6004.30 ms / 11 runs ( 545.85 ms per run)
whisper_print_timings: decode time = 548.74 ms / 95 runs ( 5.78 ms per run)
whisper_print_timings: batchd time = 15798.76 ms / 9098 runs ( 1.74 ms per run)
whisper_print_timings: prompt time = 347.48 ms / 1503 runs ( 0.23 ms per run)
whisper_print_timings: total time = 30275.82 ms
ggml_metal_free: deallocating
ggml_metal_free: deallocating
./main -m models/ggml-distil-small.en.bin -f samples/hp0.wav 17.51s user 1.67s system 63% cpu 30.319 total
Is that expected? It looks like it's faster per pass through the model at every phase, but it needs dramatically more passes.
./main was built here just by calling make, so all config parameters are the defaults.
Using the original
ggml-small.en.bin
on an M1 mac, runningwhisper.cpp
on thehp0.wav
sample gives me these timings:Running with
ggml-distil-small.en.bin
gives the following:Is that expected? It looks like it's faster per pass through the model at every phase, but it needs dramatically more passes.
./main
was built here just by callingmake
, so all config parameters are the defaults.