Whisper Python performance - Benchmarking

usefulsensors / openai-whisper

Robust Speech Recognition via Large-Scale Weak Supervision

MIT License

65 stars 25 forks source link

Whisper Python performance - Benchmarking #15

Open j1nx opened 1 year ago

j1nx commented 1 year ago

Running on OpenVoiceOS, RaspberryPi 4 - 2GB model. Using Python 3.10 and Tensorflow-lite 2.11

With the tiny model;

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper.tflite -t 4
Importing tensorflow, numpy and torch
Importing whisper
Loading tflite model models/whisper.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/test.wav
Samplerate: 16000, length: 30.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 Bili always listens to his mother. He always does what she says. If his mother says,

Inference took 4.74s for 30.0s audio file.

Loading audio file: samples/test_1.wav
Samplerate: 16000, length: 30.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 David lost his yellow pencil. He could not find it. Where is my yellow pencil? He asked his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil for before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?

Inference took 8.57s for 30.0s audio file.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 4.28s for 11.0s audio file.

StuartIanNaylor commented 1 year ago

To be honest I didn't think AMX was a 'coprocessor' and it was just some Apple specific instructions they added to the Arm IP and that is why they work just like CPU instructions, because they are. Whatever they are they do give a big boost to ML operations. https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions

j1nx commented 1 year ago

Finished the whole cross compile infrastructure and everything builded fine in the end.

https://github.com/OpenVoiceOS/ovos-buildroot/commit/876ee82daa2c4846afec1d4d505dcf872e2ad93c

Can't test before tonight / 5 hours from now, but will keep you posted if it works and how it performs.

j1nx commented 1 year ago

All fun and all, but purely CPUacc from ArmNN is not bringing a performance gain.

The ArmNN Delegate is also rather picky with models and operators as the whisper tiny wasn't working. Had to grab a typical used demo model to compare it againsty XNNPACK.

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite --num_threads=4 --report_peak_memory_footprint=true --warmup_runs=1
STARTING!
Log parameter values verbosely: [0]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite]
#threads used for CPU inference: [4]
Loaded model models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite
The input model file size (MB): 3.57776
Initialized session in 3.112ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=11 first=92585 curr=62982 min=28710 max=92585 avg=46423.7 std=17777

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=50817 curr=27858 min=25482 max=63801 avg=39285.8 std=10589

Inference timings in us: Init: 3112, First inference: 92585, Warmup (avg): 46423.7, Inference (avg): 39285.8
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=2.57812 overall=6.17969
Overall peak memory footprint (MB) via periodic monitoring: 12.1367
Memory status at the end of exeution:
- VmRSS              : 12 MB
+ RssAnnon           : 4 MB
+ RssFile + RssShmem : 8 MB

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite --warmup_runs=1 --external_delegate_path="/usr/lib/libarmnnDelegate.so" --external_delegate_options="backends:CpuAcc;logging-severity:info"
STARTING!
Log parameter values verbosely: [0]
Min warmup runs: [1]
Graph: [models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite]
External delegate path: [/usr/lib/libarmnnDelegate.so]
External delegate options: [backends:CpuAcc;logging-severity:info]
Loaded model models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite
Info: ArmNN v31.0.0
Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory
Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory
Can't load libmali.so: libmali.so: cannot open shared object file: No such file or directory
Couldn't find any OpenCL library.
Info: Initialization time: 0.83 ms.
INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
EXTERNAL delegate created.

Info: Optimize ArmnnSubgraph time: 6.77 ms
Info: Load ArmnnSubgraph time: 146.02 ms
Info: Overall ArmnnSubgraph creation time: 156.61 ms

Explicitly applied EXTERNAL delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 3.57776
Initialized session in 286.218ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=9 first=112925 curr=46424 min=35712 max=112925 avg=59661.4 std=27126

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=50733 curr=47343 min=33201 max=67901 avg=46443.1 std=7687

Inference timings in us: Init: 286218, First inference: 112925, Warmup (avg): 59661.4, Inference (avg): 46443.1
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=59.8477 overall=68.1016
Info: Shutdown time: 3.38 ms.

nyadla-sys commented 1 year ago

All fun and all, but purely CPUacc from ArmNN is not bringing a performance gain.

The ArmNN Delegate is also rather picky with models and operators as the whisper tiny wasn't working. Had to grab a typical used demo model to compare it againsty XNNPACK.

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite --num_threads=4 --report_peak_memory_footprint=true --warmup_runs=1
STARTING!
Log parameter values verbosely: [0]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite]
#threads used for CPU inference: [4]
Loaded model models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite
The input model file size (MB): 3.57776
Initialized session in 3.112ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=11 first=92585 curr=62982 min=28710 max=92585 avg=46423.7 std=17777

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=50817 curr=27858 min=25482 max=63801 avg=39285.8 std=10589

Inference timings in us: Init: 3112, First inference: 92585, Warmup (avg): 46423.7, Inference (avg): 39285.8
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=2.57812 overall=6.17969
Overall peak memory footprint (MB) via periodic monitoring: 12.1367
Memory status at the end of exeution:
- VmRSS              : 12 MB
+ RssAnnon           : 4 MB
+ RssFile + RssShmem : 8 MB

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite --warmup_runs=1 --external_delegate_path="/usr/lib/libarmnnDelegate.so" --external_delegate_options="backends:CpuAcc;logging-severity:info"
STARTING!
Log parameter values verbosely: [0]
Min warmup runs: [1]
Graph: [models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite]
External delegate path: [/usr/lib/libarmnnDelegate.so]
External delegate options: [backends:CpuAcc;logging-severity:info]
Loaded model models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite
Info: ArmNN v31.0.0
Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory
Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory
Can't load libmali.so: libmali.so: cannot open shared object file: No such file or directory
Couldn't find any OpenCL library.
Info: Initialization time: 0.83 ms.
INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
EXTERNAL delegate created.

Info: Optimize ArmnnSubgraph time: 6.77 ms
Info: Load ArmnnSubgraph time: 146.02 ms
Info: Overall ArmnnSubgraph creation time: 156.61 ms

Explicitly applied EXTERNAL delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 3.57776
Initialized session in 286.218ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=9 first=112925 curr=46424 min=35712 max=112925 avg=59661.4 std=27126

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=50733 curr=47343 min=33201 max=67901 avg=46443.1 std=7687

Inference timings in us: Init: 286218, First inference: 112925, Warmup (avg): 59661.4, Inference (avg): 46443.1
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=59.8477 overall=68.1016
Info: Shutdown time: 3.38 ms.

To run whisper-tiny model using gpu delegate, we need to generate full float model, for testing purpose I can generate full float whisper tiny model

j1nx commented 1 year ago

ArmNN is not using GPU but their special CPU instructions.

Have not yet looked into GPU support. That one is next. But by all means, if you could convert and upload GPU enabled tiny model, please do. We kind of created a nice playground overhere to test out all different aspects of whisper inference and I am sure we will end up with the best way forward for certain boards and hardware.

nyadla-sys commented 1 year ago

@j1nx appreciated your effort

nyadla-sys commented 1 year ago

@j1nx if possible could you please provide memory usage for whisper-tiny.en.tflite model on rpi4

j1nx commented 1 year ago

@j1nx if possible could you please provide memory usage for whisper-tiny.en.tflite model on rpi4

Sure no problem. Using which tool you want it?

your minimal c++ binary Python Or the bench_mark tool from tensorflow?

nyadla-sys commented 1 year ago

I am looking for minimal c++ binary. If you can do for rest it may be useful for others

nyadla-sys commented 1 year ago

Looks like benchmark tool would also provide required information

StuartIanNaylor commented 1 year ago

I found the binary delegate very picky on the distro it ran and it wasn't easy but did get it working on a Mali based board. The GPU doesn't have to be Float as my test in https://github.com/StuartIanNaylor/rock5b-wav2letter-bench Which is just the https://developer.arm.com/documentation/102603/2211 tutorial as it seemed to have quite a lot of typo's.

You can switch between GpuAcc & CpuAcc and the 8bit quantised model they provide works, but if GpuAcc works on non Mali I don't know as I tested on a MaliG610. With the Wav2Letter they omit the GPU setup on the Pi which makes me think its Mali only even if its base is OpenCL as for the Odroid with a Mali-G52 its included.

nyadla-sys commented 1 year ago

@j1nx uploaded whisper-base.tflite model along with the colab notbook

j1nx commented 1 year ago

Ok, here some memory data for the tiny model. This is running on a Raspberry Pi4 - 2GB model.

Some info first;

mycroft@OpenVoiceOS-e3830c:~/whisper $ uname -a
Linux OpenVoiceOS-e3830c 5.15.76-v8 #1 SMP Fri Dec 30 14:58:43 CET 2022 aarch64 GNU/Linux

mycroft@OpenVoiceOS-e3830c:~/whisper $ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU max MHz:         1800.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   192 KiB (4 instances)
  L2:                    1 MiB (1 instance)
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected

mycroft@OpenVoiceOS-e3830c:~/whisper $ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 20.906206 seconds, 49.0MB/s
mycroft@OpenVoiceOS-e3830c:~/whisper $ sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
mycroft@OpenVoiceOS-e3830c:~/whisper $ dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 4.949115 seconds, 206.9MB/s

mycroft@OpenVoiceOS-e3830c:~/whisper $ free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G      182.8M      561.6M        7.1M        1.0G        1.5G
Swap:        359.5M           0      359.5M

mycroft@OpenVoiceOS-e3830c:~/whisper $ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    1  29.9G  0 disk
├─sda1   8:1    1    64M  0 part /boot
├─sda2   8:2    1   1.1G  0 part /media/rfs/ro
└─sda3   8:3    1  28.7G  0 part /media/rfs/rw
zram0  254:0    0 359.5M  0 disk [SWAP]

Running the minimal C++ binary on the whisper-tiny.en.tflite while watching the memory consumption (don't know about an easy way to record) it shows 185 MB added use at max while running it;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-tiny.en.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 4 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Doing the same but runing it with the test.py showed 305 MB added use at max.

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 4.45s for 11.00s audio file.

And finally, using the benchmark tool, which shows the memory consumption it self. (watching the mem usage showed 176 MB used)

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.en.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.en.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 40.9627
Initialized session in 15.03ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=3044595

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=2903102 curr=2876513 min=2870115 max=2903102 avg=2.8807e+06 std=12080

Inference timings in us: Init: 15030, First inference: 3044595, Warmup (avg): 3.0446e+06, Inference (avg): 2.8807e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=9 overall=251.41
Overall peak memory footprint (MB) via periodic monitoring: 257.414
Memory status at the end of exeution:
- VmRSS              : 222 MB
+ RssAnnon           : 176 MB
+ RssFile + RssShmem : 46 MB

Will see what we can do with the base model now.

j1nx commented 1 year ago

The base model suffers from the same gather index out of bounds issue;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-base.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 1264 (WHILE) failed to invoke.
Error at ../minimal.cc:210

nyadla-sys commented 1 year ago

Ok, here some memory data for the tiny model. This is running on a Raspberry Pi4 - 2GB model.

Some info first;

mycroft@OpenVoiceOS-e3830c:~/whisper $ uname -a
Linux OpenVoiceOS-e3830c 5.15.76-v8 #1 SMP Fri Dec 30 14:58:43 CET 2022 aarch64 GNU/Linux

mycroft@OpenVoiceOS-e3830c:~/whisper $ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU max MHz:         1800.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   192 KiB (4 instances)
  L2:                    1 MiB (1 instance)
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected

mycroft@OpenVoiceOS-e3830c:~/whisper $ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 20.906206 seconds, 49.0MB/s
mycroft@OpenVoiceOS-e3830c:~/whisper $ sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
mycroft@OpenVoiceOS-e3830c:~/whisper $ dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 4.949115 seconds, 206.9MB/s

mycroft@OpenVoiceOS-e3830c:~/whisper $ free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G      182.8M      561.6M        7.1M        1.0G        1.5G
Swap:        359.5M           0      359.5M

mycroft@OpenVoiceOS-e3830c:~/whisper $ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    1  29.9G  0 disk
├─sda1   8:1    1    64M  0 part /boot
├─sda2   8:2    1   1.1G  0 part /media/rfs/ro
└─sda3   8:3    1  28.7G  0 part /media/rfs/rw
zram0  254:0    0 359.5M  0 disk [SWAP]

Running the minimal C++ binary on the whisper-tiny.en.tflite while watching the memory consumption (don't know about an easy way to record) it shows 185 MB added use at max while running it;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-tiny.en.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 4 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Doing the same but runing it with the test.py showed 305 MB added use at max.

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 4.45s for 11.00s audio file.

And finally, using the benchmark tool, which shows the memory consumption it self. (watching the mem usage showed 176 MB used)

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.en.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.en.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 40.9627
Initialized session in 15.03ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=3044595

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=2903102 curr=2876513 min=2870115 max=2903102 avg=2.8807e+06 std=12080

Inference timings in us: Init: 15030, First inference: 3044595, Warmup (avg): 3.0446e+06, Inference (avg): 2.8807e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=9 overall=251.41
Overall peak memory footprint (MB) via periodic monitoring: 257.414
Memory status at the end of exeution:
- VmRSS              : 222 MB
+ RssAnnon           : 176 MB
+ RssFile + RssShmem : 46 MB

Will see what we can do with the base model now.

@j1nx Great and thanks for your time on this ..

nyadla-sys commented 1 year ago

The base model suffers from the same gather index out of bounds issue;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-base.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 1264 (WHILE) failed to invoke.
Error at ../minimal.cc:210

Can you try with the latest base model that i uploaded yesterday?

j1nx commented 1 year ago

The base model suffers from the same gather index out of bounds issue;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-base.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 1264 (WHILE) failed to invoke.
Error at ../minimal.cc:210

Can you try with the latest base model that i uploaded yesterday?

Downloaded the base model this morning, so is using that latest.

j1nx commented 1 year ago

Ok, here some memory data for the tiny model. This is running on a Raspberry Pi4 - 2GB model. Some info first;

mycroft@OpenVoiceOS-e3830c:~/whisper $ uname -a
Linux OpenVoiceOS-e3830c 5.15.76-v8 #1 SMP Fri Dec 30 14:58:43 CET 2022 aarch64 GNU/Linux

mycroft@OpenVoiceOS-e3830c:~/whisper $ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU max MHz:         1800.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   192 KiB (4 instances)
  L2:                    1 MiB (1 instance)
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected

mycroft@OpenVoiceOS-e3830c:~/whisper $ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 20.906206 seconds, 49.0MB/s
mycroft@OpenVoiceOS-e3830c:~/whisper $ sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
mycroft@OpenVoiceOS-e3830c:~/whisper $ dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 4.949115 seconds, 206.9MB/s

mycroft@OpenVoiceOS-e3830c:~/whisper $ free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G      182.8M      561.6M        7.1M        1.0G        1.5G
Swap:        359.5M           0      359.5M

mycroft@OpenVoiceOS-e3830c:~/whisper $ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    1  29.9G  0 disk
├─sda1   8:1    1    64M  0 part /boot
├─sda2   8:2    1   1.1G  0 part /media/rfs/ro
└─sda3   8:3    1  28.7G  0 part /media/rfs/rw
zram0  254:0    0 359.5M  0 disk [SWAP]

Running the minimal C++ binary on the whisper-tiny.en.tflite while watching the memory consumption (don't know about an easy way to record) it shows 185 MB added use at max while running it;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-tiny.en.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 4 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Doing the same but runing it with the test.py showed 305 MB added use at max.

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 4.45s for 11.00s audio file.

And finally, using the benchmark tool, which shows the memory consumption it self. (watching the mem usage showed 176 MB used)

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.en.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.en.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 40.9627
Initialized session in 15.03ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=3044595

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=2903102 curr=2876513 min=2870115 max=2903102 avg=2.8807e+06 std=12080

Inference timings in us: Init: 15030, First inference: 3044595, Warmup (avg): 3.0446e+06, Inference (avg): 2.8807e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=9 overall=251.41
Overall peak memory footprint (MB) via periodic monitoring: 257.414
Memory status at the end of exeution:
- VmRSS              : 222 MB
+ RssAnnon           : 176 MB
+ RssFile + RssShmem : 46 MB

Will see what we can do with the base model now.

@j1nx Great and thanks for your time on this ..

No problem. Hopefully we can get the base and small models to work as well. Tiny is fast enough but perhaps base as well.

nyadla-sys commented 1 year ago

VmRSS : 222 MB

RssAnnon : 176 MB

RssFile + RssShmem : 46 MB

Just for others who want to understand about VmRSS,RssAnnon, RssFile.. These values represent the amount of memory used by a process, specifically the Resident Set Size (RSS) of the process.

VmRSS is the total amount of resident memory used by the process, measured in megabytes (MB). RssAnnon is the amount of anonymous memory used by the process, measured in megabytes (MB). Anonymous memory is memory that is not associated with a file on disk. RssFile and RssShmem are the amount of file-backed and shared memory used by the process, respectively, measured in megabytes (MB). File-backed memory is memory that is associated with a file on disk, while shared memory is memory that is shared between multiple processes. In this case, the process is using a total of 222 MB of memory, with 176 MB of that being anonymous memory and 46 MB being file-backed and shared memory.

j1nx commented 1 year ago

Very nice explaination. Thanks for that.

StuartIanNaylor commented 1 year ago

@fquirin create the runtime with bazel

grab the latest from https://github.com/bazelbuild/bazelisk likely wget to /usr/bin, chmod a+x, create symlink called bazel export USE_BAZEL_VERSION=5.3.0 Find the version in the tensorflow_src .bazelversion file

/home/orangepi/tensorflow_src/tensorflow/lite/tools/pip_package/build_pip_package_with_bazel.sh native python test.py --model=../openai-whisper/models/whisper.tflite --folder=../openai-whisper/samples/ --threads=4 cmake tflite_runtime

test.wav
 Bili always listens to his mother. He always does what she says. If his mother says,

Inference took 1.54s
test_1.wav
 David lost his yellow pencil. He could not find it. Where is my yellow pencil? He asked his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?

Inference took 2.54s
jfk.wav
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 1.51s

bazel tflite_runtime

test.wav
 Bili always listens to his mother. He always does what she says. If his mother says,

Inference took 1.08s
test_1.wav
 David lost his yellow pencil. He could not find it. Where is my yellow pencil? He asked his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil for before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?

Inference took 2.07s
jfk.wav
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 1.07s

Might be intersting creating tflite as an external lib with bazel and then linking to the tflite_minimal cmake than static? As dunno what Bazel does that cmake doesn't but results above.

j1nx commented 1 year ago

I noticed you uploaded new tiny models as well. Hereby the model benchmark tool output on them.

The normal one;

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 69.3703
Initialized session in 162.167ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=4303281

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=3882868 curr=3828750 min=3828750 max=3882868 avg=3.85199e+06 std=17539

Inference timings in us: Init: 162167, First inference: 4303281, Warmup (avg): 4.30328e+06, Inference (avg): 3.85199e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=10.7656 overall=306.816
Overall peak memory footprint (MB) via periodic monitoring: 312.531
Memory status at the end of exeution:
- VmRSS              : 244 MB
+ RssAnnon           : 171 MB
+ RssFile + RssShmem : 73 MB

The EN one;

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.en.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.en.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 41.6556
Initialized session in 119.189ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=3247494

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=2934383 curr=2893835 min=2892774 max=2934383 avg=2.90437e+06 std=15345

Inference timings in us: Init: 119189, First inference: 3247494, Warmup (avg): 3.24749e+06, Inference (avg): 2.90437e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=10.7422 overall=264.406
Overall peak memory footprint (MB) via periodic monitoring: 268.031
Memory status at the end of exeution:
- VmRSS              : 227 MB
+ RssAnnon           : 181 MB
+ RssFile + RssShmem : 46 MB

The base model still errors out.

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-base.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-base.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-base.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 77.0875
Initialized session in 182.557ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 1264 (WHILE) failed to invoke.
count=1 curr=6372530

Benchmarking failed.

But the funny thing, the small model does work?!

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-small.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-small.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-small.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 387.698
Initialized session in 152.364ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=21885361

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=20743640 curr=20800234 min=20726743 max=20882339 avg=2.07767e+07 std=59027

Inference timings in us: Init: 152364, First inference: 21885361, Warmup (avg): 2.18854e+07, Inference (avg): 2.07767e+07
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=33.9414 overall=1207.91
Overall peak memory footprint (MB) via periodic monitoring: 1211.55
Memory status at the end of exeution:
- VmRSS              : 752 MB
+ RssAnnon           : 376 MB
+ RssFile + RssShmem : 376 MB

Look at that AMAZING low memory usage for the small model !!!!

@nyadla-sys Well done / Great work.

However...

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-small.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-small.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
th Wiv array Year, high Lend thisyear or D Mic, highendic or D M thisyear.

Inference took 36.53s for 11.00s audio file.

nyadla-sys commented 1 year ago

@j1nx I just uploaded new whisper-base.tflite model. could you please try this model on rpi4 and let me know if you still see crash issue

j1nx commented 1 year ago

@j1nx I just uploaded new whisper-base.tflite model. could you please try this model on rpi4 and let me know if you still see crash issue

All good now;

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-base.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true                                                                 STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-base.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-base.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 125.575
Initialized session in 54.729ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=10344233

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=9610296 curr=9466280 min=9466280 max=9654249 avg=9.57554e+06 std=64674

Inference timings in us: Init: 54729, First inference: 10344233, Warmup (avg): 1.03442e+07, Inference (avg): 9.57554e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=15.875 overall=497.102
Overall peak memory footprint (MB) via periodic monitoring: 502.785
Memory status at the end of exeution:
- VmRSS              : 333 MB
+ RssAnnon           : 207 MB
+ RssFile + RssShmem : 126 MB

However just as with the small model, the WER is terrible.

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-base.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-base.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
th Wiv array Year, high Lend thisyear or D Mic, highendic or D M thisyear.

Inference took 11.85s for 11.00s audio file.

I suspect this has something to do with the language selection of some kind.

Is it possible that you convert and upload the small and base model ".en" models as well?

j1nx commented 1 year ago

Just confirmed it;

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
th Wiv array Year high Lend thisyear or D Mic highendic or D M thisyear

Inference took 5.92s for 11.00s audio file.
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 5.13s for 11.00s audio file.

Could also be the python script perhaps. @StuartIanNaylor @fquirin Perhaps you can have a look at it as well.

j1nx commented 1 year ago

This is interesting, plotted peak memoru usage against inference time in second

both tiny and tiny.en as the smallest. followed by base and small.

StuartIanNaylor commented 1 year ago

 python test.py --model=../openai-whisper/models/whisper-tiny.tflite --folder=../openai-whisper/samples/ --threads=4

jfk.wav
th Wiv array Year high Lend thisyear or D Mic, highendic or D M thisyear.

Inference took 1.25s

python test.py --model=../openai-whisper/models/whisper-tiny.en.tflite --folder=../openai-whisper/samples/ --threads=4

jfk.wav
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 1.13s

Slightly slower was 1.07s ?

j1nx commented 1 year ago

Indeed, but the WER of the non EN fixed model is... Well, 5% so I guess something goed wrong there.

will have a look at the C++ binary in the morning to see how that one is performing on all models.

Will also feed some more different WAV files to all models to gather some more info.

StuartIanNaylor commented 1 year ago

I presume the initial decoder tokens are hardcoded for a translation model / language or something or just missing, but the start tokens are control tokens

decoder_input_ids = torch.tensor([50258, 50266, 50358, 50363]) #<|startoftranscript|><|ja|><|translate|><|notimestamps|>

decoder_input_ids = torch.tensor([50258, 50259, 50359, 50363]) #<|startoftranscript|><|en|><|transcribe|><|notimestamps|>

j1nx commented 1 year ago

Here the output of the C++ binary with the single vocab.bin;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 6 seconds

[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_]th Wiv array Year high Lend thisyear or D Mic, highendic or D M thisyear.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.en.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 5 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

And the same multilingual model but this time with the multilingual vocab.bin;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 6 seconds

[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.[_SOT_]

And to complete the whole test package;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 6 seconds

[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds

[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 37 seconds

[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.[_SOT_]

Next up is some other different WAV files running the Python inference utilizing the tiny.en model to get some insight on the WER.

j1nx commented 1 year ago

This also might be of interest to you @nyadla-sys The base model do translation to english where as the tiny and small models just returned the language detected.

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 7 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][_BEG_] Für mich sind alle Menschen gleich unabhängig von Geschlecht, sexuelle Orientierung, Religion, Hautfarbe oder Geo-Kordinaten der Geburt.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50358][_BEG_] For me, all people are equally independent of gender, sex, orientation, religion, hate, or gender coordinates of birth.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 43 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][_BEG_] Für mich sind alle Menschen gleich, unabhängig von Geschlecht, sexueller Orientierung, Religion, Hautfarbe oder Geo-Koordinaten der Geburt.[_SOT_]

nyadla-sys commented 1 year ago

This also might be of interest to you @nyadla-sys The base model do translation to english where as the tiny and small models just returned the language detected.
thanks for your information and I will look into it
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite de_speech_thorsten_sample03_8s.wav
n_vocab:50257

mel.n_len3000

mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Inference time 7 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][BEG] Für mich sind alle Menschen gleich unabhängig von Geschlecht, sexuelle Orientierung, Religion, Hautfarbe oder Geo-Kordinaten der Geburt.[SOT]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Inference time 12 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50358][BEG] For me, all people are equally independent of gender, sex, orientation, religion, hate, or gender coordinates of birth.[SOT]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Inference time 43 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][BEG] Für mich sind alle Menschen gleich, unabhängig von Geschlecht, sexueller Orientierung, Religion, Hautfarbe oder Geo-Koordinaten der Geburt.[SOT]

fquirin commented 1 year ago

@StuartIanNaylor it looks like the tflite_runtime issue was resolved? Did you understand what the problem was? Should we use this build script as default? 🙂

StuartIanNaylor commented 1 year ago

@StuartIanNaylor it looks like the tflite_runtime issue was resolved? Did you understand what the problem was? Should we use this build script as default? 🙂

Yep the bazel native build on the rk3588 is almost 50% faster also on Pi if compiling on metal you should select native, I haven't tried but looks like the others are for cross compiling. Tensorflow does this thing where it names new features experimental but many of them can be years old and the current defacto way. So I don't know if its just optimisation as been scratching my head with minimal_buiild as that could be the same, but boy the api is complex and not always clear what is the latest and greatest, but I think some of the so called 'experimental' methods also end up in the build with bazel. I am thinking the cmake script is an older build method and they have only been updating the bazel one? Maybe we should be creating tensorflow as a dynamic lib in minimal_build via bazel and linking to it than a static embedded compile that we don't know what missing sauce is in the bazel compile. So use the python bazel build pip package as I can hack C/C++ but nowhere near building up, wel maybe I could but it would be painful and very hacky. Also I think tensorflow has got better with threads as seeing an improvement with x4 where I remember specifically with Sanebows Pi-DTLN x2/x4 threads made no diff at all. I only have x4 big cores.

I did send you a message to use bazel with a @fquirin that I have forgot where but should be in your 'mentions'

fquirin commented 1 year ago

Hi @StuartIanNaylor , do you have a tflite_runtime wheel file built with the bazel script for download maybe? My Pi seems to be to weak to build it and I don't understand how the cross-compile is supposed to work because the examples are broken 😕 (the referenced Docker files don't exist). I'd require the Python 3.9 aarch64 build 🙃.

[Edit] I did manage to compile it finally! Had to uninstall regular tensorflow and tflite packages though to use it 🤔 (some conflict it seems), but it is indeed even faster than "fat" tflite now 😀

StuartIanNaylor commented 1 year ago

Good as my native compile on Armv8,2 prob would not work on a Pi.

I am not really sure what the bazel build does that is different but yeah it is notibeabilly faster than the cmake compile that maybe hasn't been updated and creates a legacy build. I am thinking its the same for the minimal_build and maybe linking to bazel built lib than a static lib might also be faster due to whatever bazel does in the build as opposed to running cmake.

Shouldn't need docker though even though you got it working as you just specify https://www.tensorflow.org/lite/guide/build_cmake_pip#available_target_names armhf or aarch64 and it will go off and cross-compile, whilst otherwise for the host even if one of those native

fquirin commented 1 year ago

Good as my native compile on Armv8,2 prob would not work on a Pi

Actually could you try my build: download (if you have the same Python version). I'd like to know if they are compatible and perform the same :-).

StuartIanNaylor commented 1 year ago

Ran in a miniconda env as on Ubuntu with Python 3.10, but yeah runs OK. conda create -n tlite-test python=3.9 Totally forgot what we where at, but yeah it runs.

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
gb1.wav
 My fellow Americans, this day has brought terrible news and great sadness to our country. At 9 o clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors.

Inference took 2.1s
mm0.wav
 This is the micro machine man presenting the most miniature motorcade of micro machine. Each one has dramatic details for a picture of precision paint jobs. Plus incredible micro machine pocket place that's physical police station fire station restaurant service station and more. Perfect pocket portable to take any place. And there are many miniature places to play with. Each one comes with its own special edition micro machine vehicle and fantastic features that miraculously move. Raise the bolt. Lift at the airport. Marine a man. The gun turret at the army base. Clean your car at the car. Raise the toll bridge. And these place that's fitted together to form a micro machine world. Micro machine pocket place that's so tremendously tiny so perfectly precise. So doesn't we detailed you on a pocket them all micro machines and micro machine pocket place that sold separately from Glube. The smaller they are, the better they are.

Inference took 3.72s
hp0.wav
 Henry F. Phillips from Wikipedia, the free encyclopedia at en.wicopedia.org.

Inference took 1.5s
gb0.wav
 Good morning. This Tuesday is Election Day. After months of spirited debate in vigorous campaigning, the time has come for Americans to make important decisions about our nation's future and encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats, and independents can find common ground on at least one point. Our system of

Inference took 2.52s

fquirin commented 1 year ago

Nice 😎 , thanks for testing 👍