huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.33k stars 238 forks source link

ggml-distil-small.en.bin slower than ggml-small.en.bin under whisper.cpp #55

Open regularfry opened 6 months ago

regularfry commented 6 months ago

Using the original ggml-small.en.bin on an M1 mac, running whisper.cpp on the hp0.wav sample gives me these timings:

whisper_print_timings:     load time =   469.52 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   144.11 ms
whisper_print_timings:   sample time =  2631.84 ms /  3090 runs (    0.85 ms per run)
whisper_print_timings:   encode time =  7104.00 ms /    13 runs (  546.46 ms per run)
whisper_print_timings:   decode time =   330.89 ms /    25 runs (   13.24 ms per run)
whisper_print_timings:   batchd time =  9753.98 ms /  2982 runs (    3.27 ms per run)
whisper_print_timings:   prompt time =   944.44 ms /  2071 runs (    0.46 ms per run)
whisper_print_timings:    total time = 21415.03 ms
ggml_metal_free: deallocating
ggml_metal_free: deallocating
./main -m models/ggml-small.en.bin -f samples/hp0.wav  7.33s user 0.96s system 38% cpu 21.465 total

Running with ggml-distil-small.en.bin gives the following:

whisper_print_timings:     load time =   333.88 ms
whisper_print_timings:     fallbacks =  13 p /   5 h
whisper_print_timings:      mel time =   155.13 ms
whisper_print_timings:   sample time =  7036.76 ms /  9308 runs (    0.76 ms per run)
whisper_print_timings:   encode time =  6004.30 ms /    11 runs (  545.85 ms per run)
whisper_print_timings:   decode time =   548.74 ms /    95 runs (    5.78 ms per run)
whisper_print_timings:   batchd time = 15798.76 ms /  9098 runs (    1.74 ms per run)
whisper_print_timings:   prompt time =   347.48 ms /  1503 runs (    0.23 ms per run)
whisper_print_timings:    total time = 30275.82 ms
ggml_metal_free: deallocating
ggml_metal_free: deallocating
./main -m models/ggml-distil-small.en.bin -f samples/hp0.wav  17.51s user 1.67s system 63% cpu 30.319 total

Is that expected? It looks like it's faster per pass through the model at every phase, but it needs dramatically more passes.

./main was built here just by calling make, so all config parameters are the defaults.