Benchmark results (kind of) for the Raspberry Pi4

j1nx commented 1 year ago

I have compiled the latest version and running it with Tensorflow-Lite 3.11 on a Raspberry Pi4.

Below are the results of the different samples wav files.

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal whisper.tflite jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 7 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal whisper.tflite test.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds

[_SOT_][_NOT_] Bili always listens to his mother. He always does what she says. If his mother says, brush your teeth,is mother. His mother does not have to ask him again. She asks him to do something one time and she does not ask agai

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal whisper.tflite test_1.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 11 seconds

[_SOT_][_NOT_] David lost his yellow pencil. He could not find it. Where is my yellow pencil? Yes his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?

Considering the first wav file is 11 seconds and the other two exactly 30 seconds, it looks like it is possible for realtime encoding on a RPI4. It also runs on 1 CPU, so not bad. Not bad at all!

Great work.

nyadla-sys commented 1 year ago

Please use the below patch and it may enable multithreading

diff --git a/tflite_minimal/minimal.cc b/tflite_minimal/minimal.cc index cd045e0..e059b05 100644 --- a/tflite_minimal/minimal.cc +++ b/tflite_minimal/minimal.cc @@ -200,6 +200,10 @@ int main(int argc, char argv[]) { else if (argc == 3) { memcpy(input, mel.data.data(), mel.n_melmel.n_len*sizeof(float)); }

interpreter->SetNumThreads(-1);

//For more details refer this link

Sets the number of threads used by the interpreter and available to CPU kernels. If not set, the interpreter will use an implementation-dependent default number of threads. Currently, only a subset of kernels, such as conv, support multi-threading. num_threads should be >= -1. Setting num_threads to 0 has the effect to disable multithreading, which is equivalent to setting num_threads to 1. If set to the value -1, the number of threads used will be implementation-defined and platform-dependent.

j1nx commented 1 year ago

Ok, will give -1 a go and if it does not work use 4 as that works with the python wrapper if tflite.

j1nx commented 1 year ago

Btw, as I installed the tflite as system lib I compiled it by addibg a quick and dirty cmake file instead of pulling in tflite sources again.

https://github.com/OpenVoiceOS/ovos-buildroot/blob/117556f85e79f2f128041a8f603e49df14f67a2d/buildroot-external/package/whisper-tflite/0001-Add-CMakeLists.txt.patch

nyadla-sys commented 1 year ago

You may want to try stream on Raspberrry Pi4 https://github.com/usefulsensors/openai-whisper/tree/main/stream

j1nx commented 1 year ago

Will look into that next and report back to you.

j1nx commented 1 year ago

with -1 as multithread, it apears to still use 1 thread. So recompiled with 4

below the results using 4 threads.

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal whisper.tflite jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 5 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal whisper.tflite test.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 9 seconds

[_SOT_][_NOT_] Bili always listens to his mother. He always does what she says. If his mother says, brush your teeth, Bili brush his teeth. If his mother says, go to bed, Bili goes to bed. Bili is a very good boy, a good boy listens to his mother. His mother does not have to ask him again. She asks him to do something one time and she does not ask again. Bili is a good boy. He does what his mother asks the first time. She does not have to ask again.

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal whisper.tflite test_1.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 8 seconds

[_SOT_][_NOT_] David lost his yellow pencil. He could not find it. Where is my yellow pencil? Yes his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?

Perhaps for the future it would be nice to add a "-t " flag just as whispercpp does.

Will check out the streaming binary and report back.

StuartIanNaylor commented 1 year ago

tf.lite.Interpreter(
    model_path=None,
    model_content=None,
    experimental_delegates=None,
    num_threads=2

As an example let me know how you go as I have found tflite doesn't scale that well or didn't with the models I tried. My memory stinks but settled for x2 threads and think it scaled but not at x2 level and above that the diminishing returns just didn't seem worth it. Interested how it goes as guess its process length vs inter-process comms time is a fight to how it scales and with a bigger model like whisper maybe it scales much better than the smaller image based models I have tried.

j1nx commented 1 year ago

Took a better look at the -1 option for multithread and indeed it looks like it uses 2 threads however only one if them reaches 100% cpu usage. The other thread doesn't.

So guess setting it to -1 and let the ststemd figure it out itself appears to be the better way forward.

haven't build the streaming binary yet.

j1nx commented 1 year ago

Re-ran the test.wav with both -1 and 4 together by timing the command.

With -1 it some times use two threads, sometimes one and then switch to another and sometimes one at 100% and another at around 40%

Anyhow, results below;

mycroft@OpenVoiceOS-e3830c:~/whisper $ /usr/bin/time minimal whisper.tflite test.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds

[_SOT_][_NOT_] Bili always listens to his mother. He always does what she says. If his mother says, brush your teeth, Bili brush his teeth. If his mother says, go to bed, Bili goes to bed. Bili is a very good boy, a good boy listens to his mother. His mother does not have to ask him again. She asks him to do something one time and she does not ask again. Bili is a good boy. He does what his mother asks the first time. She does not have to ask again.

13.49user 0.31system 0:13.88elapsed 99%CPU (0avgtext+0avgdata 238684maxresident)k
0inputs+0outputs (0major+58937minor)pagefaults 0swaps

Did the same with it being set to 4 and then it just uses all four cpu's. Not always all at 100% but definetely doing this at four threads at the same time. It then take around 9 seconds to transcribe.

So yeah, using -1 and let tflite figure it out does not always scale right, however hardcoding it to a number does work better. And yes hardcoding it to two takes ~10 seconds to encode, so using more threads does bring something but yeah, anything above 2 does not bring more then it cost.

j1nx commented 1 year ago

Just for completeness-sake, below the same results for a Raspberry Pi 3b+

mycroft@OpenVoiceOS-9f5e16:~/.local/state/mycroft/whisper $ /usr/bin/time minimal whisper.tflite jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

24.83user 0.55system 0:19.71elapsed 128%CPU (0avgtext+0avgdata 238584maxresident)k
0inputs+0outputs (0major+62031minor)pagefaults 0swaps

Strange thing is. The other two WAV files get's cutted off at ~12 seconds?

nyadla-sys commented 1 year ago

Thanks @j1nx

usefulsensors / openai-whisper

Benchmark results (kind of) for the Raspberry Pi4 #8