Open j1nx opened 1 year ago
To be honest I didn't think AMX was a 'coprocessor' and it was just some Apple specific instructions they added to the Arm IP and that is why they work just like CPU instructions, because they are. Whatever they are they do give a big boost to ML operations. https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions
Finished the whole cross compile infrastructure and everything builded fine in the end.
https://github.com/OpenVoiceOS/ovos-buildroot/commit/876ee82daa2c4846afec1d4d505dcf872e2ad93c
Can't test before tonight / 5 hours from now, but will keep you posted if it works and how it performs.
All fun and all, but purely CPUacc from ArmNN is not bringing a performance gain.
The ArmNN Delegate is also rather picky with models and operators as the whisper tiny wasn't working. Had to grab a typical used demo model to compare it againsty XNNPACK.
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite --num_threads=4 --report_peak_memory_footprint=true --warmup_runs=1
STARTING!
Log parameter values verbosely: [0]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite]
#threads used for CPU inference: [4]
Loaded model models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite
The input model file size (MB): 3.57776
Initialized session in 3.112ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=11 first=92585 curr=62982 min=28710 max=92585 avg=46423.7 std=17777
Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=50817 curr=27858 min=25482 max=63801 avg=39285.8 std=10589
Inference timings in us: Init: 3112, First inference: 92585, Warmup (avg): 46423.7, Inference (avg): 39285.8
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=2.57812 overall=6.17969
Overall peak memory footprint (MB) via periodic monitoring: 12.1367
Memory status at the end of exeution:
- VmRSS : 12 MB
+ RssAnnon : 4 MB
+ RssFile + RssShmem : 8 MB
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite --warmup_runs=1 --external_delegate_path="/usr/lib/libarmnnDelegate.so" --external_delegate_options="backends:CpuAcc;logging-severity:info"
STARTING!
Log parameter values verbosely: [0]
Min warmup runs: [1]
Graph: [models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite]
External delegate path: [/usr/lib/libarmnnDelegate.so]
External delegate options: [backends:CpuAcc;logging-severity:info]
Loaded model models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite
Info: ArmNN v31.0.0
Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory
Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory
Can't load libmali.so: libmali.so: cannot open shared object file: No such file or directory
Couldn't find any OpenCL library.
Info: Initialization time: 0.83 ms.
INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
EXTERNAL delegate created.
Info: Optimize ArmnnSubgraph time: 6.77 ms
Info: Load ArmnnSubgraph time: 146.02 ms
Info: Overall ArmnnSubgraph creation time: 156.61 ms
Explicitly applied EXTERNAL delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 3.57776
Initialized session in 286.218ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=9 first=112925 curr=46424 min=35712 max=112925 avg=59661.4 std=27126
Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=50733 curr=47343 min=33201 max=67901 avg=46443.1 std=7687
Inference timings in us: Init: 286218, First inference: 112925, Warmup (avg): 59661.4, Inference (avg): 46443.1
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=59.8477 overall=68.1016
Info: Shutdown time: 3.38 ms.
All fun and all, but purely CPUacc from ArmNN is not bringing a performance gain.
The ArmNN Delegate is also rather picky with models and operators as the whisper tiny wasn't working. Had to grab a typical used demo model to compare it againsty XNNPACK.
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite --num_threads=4 --report_peak_memory_footprint=true --warmup_runs=1 STARTING! Log parameter values verbosely: [0] Num threads: [4] Min warmup runs: [1] Report the peak memory footprint: [1] Graph: [models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite] #threads used for CPU inference: [4] Loaded model models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite The input model file size (MB): 3.57776 Initialized session in 3.112ms. Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds. count=11 first=92585 curr=62982 min=28710 max=92585 avg=46423.7 std=17777 Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds. count=50 first=50817 curr=27858 min=25482 max=63801 avg=39285.8 std=10589 Inference timings in us: Init: 3112, First inference: 92585, Warmup (avg): 46423.7, Inference (avg): 39285.8 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. Memory footprint delta from the start of the tool (MB): init=2.57812 overall=6.17969 Overall peak memory footprint (MB) via periodic monitoring: 12.1367 Memory status at the end of exeution: - VmRSS : 12 MB + RssAnnon : 4 MB + RssFile + RssShmem : 8 MB mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite --warmup_runs=1 --external_delegate_path="/usr/lib/libarmnnDelegate.so" --external_delegate_options="backends:CpuAcc;logging-severity:info" STARTING! Log parameter values verbosely: [0] Min warmup runs: [1] Graph: [models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite] External delegate path: [/usr/lib/libarmnnDelegate.so] External delegate options: [backends:CpuAcc;logging-severity:info] Loaded model models/mobilenet_v2_1.0_224_quantized_1_default_1.tflite Info: ArmNN v31.0.0 Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory Can't load libmali.so: libmali.so: cannot open shared object file: No such file or directory Couldn't find any OpenCL library. Info: Initialization time: 0.83 ms. INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate. EXTERNAL delegate created. Info: Optimize ArmnnSubgraph time: 6.77 ms Info: Load ArmnnSubgraph time: 146.02 ms Info: Overall ArmnnSubgraph creation time: 156.61 ms Explicitly applied EXTERNAL delegate, and the model graph will be completely executed by the delegate. The input model file size (MB): 3.57776 Initialized session in 286.218ms. Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds. count=9 first=112925 curr=46424 min=35712 max=112925 avg=59661.4 std=27126 Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds. count=50 first=50733 curr=47343 min=33201 max=67901 avg=46443.1 std=7687 Inference timings in us: Init: 286218, First inference: 112925, Warmup (avg): 59661.4, Inference (avg): 46443.1 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. Memory footprint delta from the start of the tool (MB): init=59.8477 overall=68.1016 Info: Shutdown time: 3.38 ms.
To run whisper-tiny model using gpu delegate, we need to generate full float model, for testing purpose I can generate full float whisper tiny model
ArmNN is not using GPU but their special CPU instructions.
Have not yet looked into GPU support. That one is next. But by all means, if you could convert and upload GPU enabled tiny model, please do. We kind of created a nice playground overhere to test out all different aspects of whisper inference and I am sure we will end up with the best way forward for certain boards and hardware.
@j1nx appreciated your effort
@j1nx if possible could you please provide memory usage for whisper-tiny.en.tflite model on rpi4
@j1nx if possible could you please provide memory usage for whisper-tiny.en.tflite model on rpi4
Sure no problem. Using which tool you want it?
your minimal c++ binary Python Or the bench_mark tool from tensorflow?
I am looking for minimal c++ binary. If you can do for rest it may be useful for others
Looks like benchmark tool would also provide required information
I found the binary delegate very picky on the distro it ran and it wasn't easy but did get it working on a Mali based board. The GPU doesn't have to be Float as my test in https://github.com/StuartIanNaylor/rock5b-wav2letter-bench Which is just the https://developer.arm.com/documentation/102603/2211 tutorial as it seemed to have quite a lot of typo's.
You can switch between GpuAcc & CpuAcc and the 8bit quantised model they provide works, but if GpuAcc works on non Mali I don't know as I tested on a MaliG610. With the Wav2Letter they omit the GPU setup on the Pi which makes me think its Mali only even if its base is OpenCL as for the Odroid with a Mali-G52 its included.
@j1nx uploaded whisper-base.tflite model along with the colab notbook
Ok, here some memory data for the tiny model. This is running on a Raspberry Pi4 - 2GB model.
Some info first;
mycroft@OpenVoiceOS-e3830c:~/whisper $ uname -a
Linux OpenVoiceOS-e3830c 5.15.76-v8 #1 SMP Fri Dec 30 14:58:43 CET 2022 aarch64 GNU/Linux
mycroft@OpenVoiceOS-e3830c:~/whisper $ lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Cortex-A72
Model: 3
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 1
Stepping: r0p3
CPU max MHz: 1800.0000
CPU min MHz: 600.0000
BogoMIPS: 108.00
Flags: fp asimd evtstrm crc32 cpuid
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 192 KiB (4 instances)
L2: 1 MiB (1 instance)
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec store bypass: Vulnerable
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Vulnerable
Srbds: Not affected
Tsx async abort: Not affected
mycroft@OpenVoiceOS-e3830c:~/whisper $ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 20.906206 seconds, 49.0MB/s
mycroft@OpenVoiceOS-e3830c:~/whisper $ sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
mycroft@OpenVoiceOS-e3830c:~/whisper $ dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 4.949115 seconds, 206.9MB/s
mycroft@OpenVoiceOS-e3830c:~/whisper $ free -h
total used free shared buff/cache available
Mem: 1.8G 182.8M 561.6M 7.1M 1.0G 1.5G
Swap: 359.5M 0 359.5M
mycroft@OpenVoiceOS-e3830c:~/whisper $ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 1 29.9G 0 disk
├─sda1 8:1 1 64M 0 part /boot
├─sda2 8:2 1 1.1G 0 part /media/rfs/ro
└─sda3 8:3 1 28.7G 0 part /media/rfs/rw
zram0 254:0 0 359.5M 0 disk [SWAP]
Running the minimal C++ binary on the whisper-tiny.en.tflite while watching the memory consumption (don't know about an easy way to record) it shows 185 MB added use at max while running it;
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.en.tflite samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 4 seconds
[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Doing the same but runing it with the test.py showed 305 MB added use at max.
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Inference took 4.45s for 11.00s audio file.
And finally, using the benchmark tool, which shows the memory consumption it self. (watching the mem usage showed 176 MB used)
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.en.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.en.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 40.9627
Initialized session in 15.03ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=3044595
Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=2903102 curr=2876513 min=2870115 max=2903102 avg=2.8807e+06 std=12080
Inference timings in us: Init: 15030, First inference: 3044595, Warmup (avg): 3.0446e+06, Inference (avg): 2.8807e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=9 overall=251.41
Overall peak memory footprint (MB) via periodic monitoring: 257.414
Memory status at the end of exeution:
- VmRSS : 222 MB
+ RssAnnon : 176 MB
+ RssFile + RssShmem : 46 MB
Will see what we can do with the base model now.
The base model suffers from the same gather index out of bounds issue;
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 1264 (WHILE) failed to invoke.
Error at ../minimal.cc:210
Ok, here some memory data for the tiny model. This is running on a Raspberry Pi4 - 2GB model.
Some info first;
mycroft@OpenVoiceOS-e3830c:~/whisper $ uname -a Linux OpenVoiceOS-e3830c 5.15.76-v8 #1 SMP Fri Dec 30 14:58:43 CET 2022 aarch64 GNU/Linux mycroft@OpenVoiceOS-e3830c:~/whisper $ lscpu Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: ARM Model name: Cortex-A72 Model: 3 Thread(s) per core: 1 Core(s) per cluster: 4 Socket(s): - Cluster(s): 1 Stepping: r0p3 CPU max MHz: 1800.0000 CPU min MHz: 600.0000 BogoMIPS: 108.00 Flags: fp asimd evtstrm crc32 cpuid Caches (sum of all): L1d: 128 KiB (4 instances) L1i: 192 KiB (4 instances) L2: 1 MiB (1 instance) Vulnerabilities: Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Not affected Spec store bypass: Vulnerable Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Vulnerable Srbds: Not affected Tsx async abort: Not affected mycroft@OpenVoiceOS-e3830c:~/whisper $ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync 1024+0 records in 1024+0 records out 1073741824 bytes (1.0GB) copied, 20.906206 seconds, 49.0MB/s mycroft@OpenVoiceOS-e3830c:~/whisper $ sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches" mycroft@OpenVoiceOS-e3830c:~/whisper $ dd if=tempfile of=/dev/null bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.0GB) copied, 4.949115 seconds, 206.9MB/s mycroft@OpenVoiceOS-e3830c:~/whisper $ free -h total used free shared buff/cache available Mem: 1.8G 182.8M 561.6M 7.1M 1.0G 1.5G Swap: 359.5M 0 359.5M mycroft@OpenVoiceOS-e3830c:~/whisper $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 1 29.9G 0 disk ├─sda1 8:1 1 64M 0 part /boot ├─sda2 8:2 1 1.1G 0 part /media/rfs/ro └─sda3 8:3 1 28.7G 0 part /media/rfs/rw zram0 254:0 0 359.5M 0 disk [SWAP]
Running the minimal C++ binary on the whisper-tiny.en.tflite while watching the memory consumption (don't know about an easy way to record) it shows 185 MB added use at max while running it;
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.en.tflite samples/jfk.wav n_vocab:50257 mel.n_len3000 mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Inference time 4 seconds [_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Doing the same but runing it with the test.py showed 305 MB added use at max.
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2 Importing tflite_runtime Importing numpy Importing whisper Loading tflite model models/whisper-tiny.en.tflite ... INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Loading audio file: samples/jfk.wav Samplerate: 16000, length: 11.0s Calculating mel spectrogram... Invoking interpreter ... Preparing output data ... Converting tokens ... And so my fellow Americans ask not what your country can do for you, ask what you can do for your country. Inference took 4.45s for 11.00s audio file.
And finally, using the benchmark tool, which shows the memory consumption it self. (watching the mem usage showed 176 MB used)
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true STARTING! Log parameter values verbosely: [0] Min num runs: [5] Num threads: [4] Min warmup runs: [1] Report the peak memory footprint: [1] Graph: [models/whisper-tiny.en.tflite] #threads used for CPU inference: [4] Loaded model models/whisper-tiny.en.tflite INFO: Created TensorFlow Lite XNNPACK delegate for CPU. The input model file size (MB): 40.9627 Initialized session in 15.03ms. Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds. count=1 curr=3044595 Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds. count=5 first=2903102 curr=2876513 min=2870115 max=2903102 avg=2.8807e+06 std=12080 Inference timings in us: Init: 15030, First inference: 3044595, Warmup (avg): 3.0446e+06, Inference (avg): 2.8807e+06 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. Memory footprint delta from the start of the tool (MB): init=9 overall=251.41 Overall peak memory footprint (MB) via periodic monitoring: 257.414 Memory status at the end of exeution: - VmRSS : 222 MB + RssAnnon : 176 MB + RssFile + RssShmem : 46 MB
Will see what we can do with the base model now.
@j1nx Great and thanks for your time on this ..
The base model suffers from the same gather index out of bounds issue;
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite samples/jfk.wav n_vocab:50257 mel.n_len3000 mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. ERROR: gather index out of bounds ERROR: Node number 35 (GATHER) failed to invoke. ERROR: Node number 1264 (WHILE) failed to invoke. Error at ../minimal.cc:210
Can you try with the latest base model that i uploaded yesterday?
The base model suffers from the same gather index out of bounds issue;
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite samples/jfk.wav n_vocab:50257 mel.n_len3000 mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. ERROR: gather index out of bounds ERROR: Node number 35 (GATHER) failed to invoke. ERROR: Node number 1264 (WHILE) failed to invoke. Error at ../minimal.cc:210
Can you try with the latest base model that i uploaded yesterday?
Downloaded the base model this morning, so is using that latest.
Ok, here some memory data for the tiny model. This is running on a Raspberry Pi4 - 2GB model. Some info first;
mycroft@OpenVoiceOS-e3830c:~/whisper $ uname -a Linux OpenVoiceOS-e3830c 5.15.76-v8 #1 SMP Fri Dec 30 14:58:43 CET 2022 aarch64 GNU/Linux mycroft@OpenVoiceOS-e3830c:~/whisper $ lscpu Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: ARM Model name: Cortex-A72 Model: 3 Thread(s) per core: 1 Core(s) per cluster: 4 Socket(s): - Cluster(s): 1 Stepping: r0p3 CPU max MHz: 1800.0000 CPU min MHz: 600.0000 BogoMIPS: 108.00 Flags: fp asimd evtstrm crc32 cpuid Caches (sum of all): L1d: 128 KiB (4 instances) L1i: 192 KiB (4 instances) L2: 1 MiB (1 instance) Vulnerabilities: Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Not affected Spec store bypass: Vulnerable Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Vulnerable Srbds: Not affected Tsx async abort: Not affected mycroft@OpenVoiceOS-e3830c:~/whisper $ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync 1024+0 records in 1024+0 records out 1073741824 bytes (1.0GB) copied, 20.906206 seconds, 49.0MB/s mycroft@OpenVoiceOS-e3830c:~/whisper $ sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches" mycroft@OpenVoiceOS-e3830c:~/whisper $ dd if=tempfile of=/dev/null bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.0GB) copied, 4.949115 seconds, 206.9MB/s mycroft@OpenVoiceOS-e3830c:~/whisper $ free -h total used free shared buff/cache available Mem: 1.8G 182.8M 561.6M 7.1M 1.0G 1.5G Swap: 359.5M 0 359.5M mycroft@OpenVoiceOS-e3830c:~/whisper $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 1 29.9G 0 disk ├─sda1 8:1 1 64M 0 part /boot ├─sda2 8:2 1 1.1G 0 part /media/rfs/ro └─sda3 8:3 1 28.7G 0 part /media/rfs/rw zram0 254:0 0 359.5M 0 disk [SWAP]
Running the minimal C++ binary on the whisper-tiny.en.tflite while watching the memory consumption (don't know about an easy way to record) it shows 185 MB added use at max while running it;
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.en.tflite samples/jfk.wav n_vocab:50257 mel.n_len3000 mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Inference time 4 seconds [_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Doing the same but runing it with the test.py showed 305 MB added use at max.
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2 Importing tflite_runtime Importing numpy Importing whisper Loading tflite model models/whisper-tiny.en.tflite ... INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Loading audio file: samples/jfk.wav Samplerate: 16000, length: 11.0s Calculating mel spectrogram... Invoking interpreter ... Preparing output data ... Converting tokens ... And so my fellow Americans ask not what your country can do for you, ask what you can do for your country. Inference took 4.45s for 11.00s audio file.
And finally, using the benchmark tool, which shows the memory consumption it self. (watching the mem usage showed 176 MB used)
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true STARTING! Log parameter values verbosely: [0] Min num runs: [5] Num threads: [4] Min warmup runs: [1] Report the peak memory footprint: [1] Graph: [models/whisper-tiny.en.tflite] #threads used for CPU inference: [4] Loaded model models/whisper-tiny.en.tflite INFO: Created TensorFlow Lite XNNPACK delegate for CPU. The input model file size (MB): 40.9627 Initialized session in 15.03ms. Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds. count=1 curr=3044595 Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds. count=5 first=2903102 curr=2876513 min=2870115 max=2903102 avg=2.8807e+06 std=12080 Inference timings in us: Init: 15030, First inference: 3044595, Warmup (avg): 3.0446e+06, Inference (avg): 2.8807e+06 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. Memory footprint delta from the start of the tool (MB): init=9 overall=251.41 Overall peak memory footprint (MB) via periodic monitoring: 257.414 Memory status at the end of exeution: - VmRSS : 222 MB + RssAnnon : 176 MB + RssFile + RssShmem : 46 MB
Will see what we can do with the base model now.
@j1nx Great and thanks for your time on this ..
No problem. Hopefully we can get the base and small models to work as well. Tiny is fast enough but perhaps base as well.
- VmRSS : 222 MB
- RssAnnon : 176 MB
- RssFile + RssShmem : 46 MB
Just for others who want to understand about VmRSS,RssAnnon, RssFile.. These values represent the amount of memory used by a process, specifically the Resident Set Size (RSS) of the process.
VmRSS is the total amount of resident memory used by the process, measured in megabytes (MB). RssAnnon is the amount of anonymous memory used by the process, measured in megabytes (MB). Anonymous memory is memory that is not associated with a file on disk. RssFile and RssShmem are the amount of file-backed and shared memory used by the process, respectively, measured in megabytes (MB). File-backed memory is memory that is associated with a file on disk, while shared memory is memory that is shared between multiple processes. In this case, the process is using a total of 222 MB of memory, with 176 MB of that being anonymous memory and 46 MB being file-backed and shared memory.
Very nice explaination. Thanks for that.
@fquirin create the runtime with bazel
grab the latest from https://github.com/bazelbuild/bazelisk likely wget to /usr/bin, chmod a+x, create symlink called bazel export USE_BAZEL_VERSION=5.3.0 Find the version in the tensorflow_src .bazelversion file
/home/orangepi/tensorflow_src/tensorflow/lite/tools/pip_package/build_pip_package_with_bazel.sh native python test.py --model=../openai-whisper/models/whisper.tflite --folder=../openai-whisper/samples/ --threads=4 cmake tflite_runtime
test.wav
Bili always listens to his mother. He always does what she says. If his mother says,
Inference took 1.54s
test_1.wav
David lost his yellow pencil. He could not find it. Where is my yellow pencil? He asked his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?
Inference took 2.54s
jfk.wav
And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Inference took 1.51s
bazel tflite_runtime
test.wav
Bili always listens to his mother. He always does what she says. If his mother says,
Inference took 1.08s
test_1.wav
David lost his yellow pencil. He could not find it. Where is my yellow pencil? He asked his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil for before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?
Inference took 2.07s
jfk.wav
And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Inference took 1.07s
Might be intersting creating tflite as an external lib with bazel and then linking to the tflite_minimal cmake than static? As dunno what Bazel does that cmake doesn't but results above.
I noticed you uploaded new tiny models as well. Hereby the model benchmark tool output on them.
The normal one;
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 69.3703
Initialized session in 162.167ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=4303281
Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=3882868 curr=3828750 min=3828750 max=3882868 avg=3.85199e+06 std=17539
Inference timings in us: Init: 162167, First inference: 4303281, Warmup (avg): 4.30328e+06, Inference (avg): 3.85199e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=10.7656 overall=306.816
Overall peak memory footprint (MB) via periodic monitoring: 312.531
Memory status at the end of exeution:
- VmRSS : 244 MB
+ RssAnnon : 171 MB
+ RssFile + RssShmem : 73 MB
The EN one;
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.en.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.en.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 41.6556
Initialized session in 119.189ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=3247494
Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=2934383 curr=2893835 min=2892774 max=2934383 avg=2.90437e+06 std=15345
Inference timings in us: Init: 119189, First inference: 3247494, Warmup (avg): 3.24749e+06, Inference (avg): 2.90437e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=10.7422 overall=264.406
Overall peak memory footprint (MB) via periodic monitoring: 268.031
Memory status at the end of exeution:
- VmRSS : 227 MB
+ RssAnnon : 181 MB
+ RssFile + RssShmem : 46 MB
The base model still errors out.
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-base.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-base.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-base.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 77.0875
Initialized session in 182.557ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 1264 (WHILE) failed to invoke.
count=1 curr=6372530
Benchmarking failed.
But the funny thing, the small model does work?!
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-small.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-small.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-small.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 387.698
Initialized session in 152.364ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=21885361
Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=20743640 curr=20800234 min=20726743 max=20882339 avg=2.07767e+07 std=59027
Inference timings in us: Init: 152364, First inference: 21885361, Warmup (avg): 2.18854e+07, Inference (avg): 2.07767e+07
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=33.9414 overall=1207.91
Overall peak memory footprint (MB) via periodic monitoring: 1211.55
Memory status at the end of exeution:
- VmRSS : 752 MB
+ RssAnnon : 376 MB
+ RssFile + RssShmem : 376 MB
Look at that AMAZING low memory usage for the small model !!!!
@nyadla-sys Well done / Great work.
However...
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-small.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-small.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
th Wiv array Year, high Lend thisyear or D Mic, highendic or D M thisyear.
Inference took 36.53s for 11.00s audio file.
@j1nx I just uploaded new whisper-base.tflite model. could you please try this model on rpi4 and let me know if you still see crash issue
@j1nx I just uploaded new whisper-base.tflite model. could you please try this model on rpi4 and let me know if you still see crash issue
All good now;
mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-base.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-base.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-base.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 125.575
Initialized session in 54.729ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=10344233
Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=9610296 curr=9466280 min=9466280 max=9654249 avg=9.57554e+06 std=64674
Inference timings in us: Init: 54729, First inference: 10344233, Warmup (avg): 1.03442e+07, Inference (avg): 9.57554e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=15.875 overall=497.102
Overall peak memory footprint (MB) via periodic monitoring: 502.785
Memory status at the end of exeution:
- VmRSS : 333 MB
+ RssAnnon : 207 MB
+ RssFile + RssShmem : 126 MB
However just as with the small model, the WER is terrible.
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-base.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-base.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
th Wiv array Year, high Lend thisyear or D Mic, highendic or D M thisyear.
Inference took 11.85s for 11.00s audio file.
I suspect this has something to do with the language selection of some kind.
Is it possible that you convert and upload the small and base model ".en" models as well?
Just confirmed it;
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
th Wiv array Year high Lend thisyear or D Mic highendic or D M thisyear
Inference took 5.92s for 11.00s audio file.
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Inference took 5.13s for 11.00s audio file.
Could also be the python script perhaps. @StuartIanNaylor @fquirin Perhaps you can have a look at it as well.
This is interesting, plotted peak memoru usage against inference time in second
both tiny and tiny.en as the smallest. followed by base and small.
python test.py --model=../openai-whisper/models/whisper-tiny.tflite --folder=../openai-whisper/samples/ --threads=4
jfk.wav
th Wiv array Year high Lend thisyear or D Mic, highendic or D M thisyear.
Inference took 1.25s
python test.py --model=../openai-whisper/models/whisper-tiny.en.tflite --folder=../openai-whisper/samples/ --threads=4
jfk.wav
And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Inference took 1.13s
Slightly slower was 1.07s ?
Indeed, but the WER of the non EN fixed model is... Well, 5% so I guess something goed wrong there.
will have a look at the C++ binary in the morning to see how that one is performing on all models.
Will also feed some more different WAV files to all models to gather some more info.
I presume the initial decoder tokens are hardcoded for a translation model / language or something or just missing, but the start tokens are control tokens
decoder_input_ids = torch.tensor([50258, 50266, 50358, 50363]) #<|startoftranscript|><|ja|><|translate|><|notimestamps|>
decoder_input_ids = torch.tensor([50258, 50259, 50359, 50363]) #<|startoftranscript|><|en|><|transcribe|><|notimestamps|>
Here the output of the C++ binary with the single vocab.bin;
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 6 seconds
[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_]th Wiv array Year high Lend thisyear or D Mic, highendic or D M thisyear.[_SOT_]
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.en.tflite samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 5 seconds
[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
And the same multilingual model but this time with the multilingual vocab.bin;
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 6 seconds
[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.[_SOT_]
And to complete the whole test package;
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 6 seconds
[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.[_SOT_]
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds
[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.[_SOT_]
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 37 seconds
[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.[_SOT_]
Next up is some other different WAV files running the Python inference utilizing the tiny.en model to get some insight on the WER.
This also might be of interest to you @nyadla-sys The base model do translation to english where as the tiny and small models just returned the language detected.
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite de_speech_thorsten_sample03_8s.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 7 seconds
[_extra_token_50258][_extra_token_50261][_extra_token_50359][_BEG_] Für mich sind alle Menschen gleich unabhängig von Geschlecht, sexuelle Orientierung, Religion, Hautfarbe oder Geo-Kordinaten der Geburt.[_SOT_]
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite de_speech_thorsten_sample03_8s.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds
[_extra_token_50258][_extra_token_50261][_extra_token_50358][_BEG_] For me, all people are equally independent of gender, sex, orientation, religion, hate, or gender coordinates of birth.[_SOT_]
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite de_speech_thorsten_sample03_8s.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 43 seconds
[_extra_token_50258][_extra_token_50261][_extra_token_50359][_BEG_] Für mich sind alle Menschen gleich, unabhängig von Geschlecht, sexueller Orientierung, Religion, Hautfarbe oder Geo-Koordinaten der Geburt.[_SOT_]
This also might be of interest to you @nyadla-sys The base model do translation to english where as the tiny and small models just returned the language detected.
thanks for your information and I will look into it
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite de_speech_thorsten_sample03_8s.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Inference time 7 seconds
[_extra_token_50258][_extra_token_50261][_extra_token_50359][BEG] Für mich sind alle Menschen gleich unabhängig von Geschlecht, sexuelle Orientierung, Religion, Hautfarbe oder Geo-Kordinaten der Geburt.[SOT]
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite de_speech_thorsten_sample03_8s.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Inference time 12 seconds
[_extra_token_50258][_extra_token_50261][_extra_token_50358][BEG] For me, all people are equally independent of gender, sex, orientation, religion, hate, or gender coordinates of birth.[SOT]
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite de_speech_thorsten_sample03_8s.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Inference time 43 seconds
[_extra_token_50258][_extra_token_50261][_extra_token_50359][BEG] Für mich sind alle Menschen gleich, unabhängig von Geschlecht, sexueller Orientierung, Religion, Hautfarbe oder Geo-Koordinaten der Geburt.[SOT]
@StuartIanNaylor it looks like the tflite_runtime
issue was resolved? Did you understand what the problem was? Should we use this build script as default? 🙂
@StuartIanNaylor it looks like the
tflite_runtime
issue was resolved? Did you understand what the problem was? Should we use this build script as default? 🙂
Yep the bazel native build on the rk3588 is almost 50% faster also on Pi if compiling on metal you should select native, I haven't tried but looks like the others are for cross compiling. Tensorflow does this thing where it names new features experimental but many of them can be years old and the current defacto way. So I don't know if its just optimisation as been scratching my head with minimal_buiild as that could be the same, but boy the api is complex and not always clear what is the latest and greatest, but I think some of the so called 'experimental' methods also end up in the build with bazel. I am thinking the cmake script is an older build method and they have only been updating the bazel one? Maybe we should be creating tensorflow as a dynamic lib in minimal_build via bazel and linking to it than a static embedded compile that we don't know what missing sauce is in the bazel compile. So use the python bazel build pip package as I can hack C/C++ but nowhere near building up, wel maybe I could but it would be painful and very hacky. Also I think tensorflow has got better with threads as seeing an improvement with x4 where I remember specifically with Sanebows Pi-DTLN x2/x4 threads made no diff at all. I only have x4 big cores.
I did send you a message to use bazel with a @fquirin that I have forgot where but should be in your 'mentions'
Hi @StuartIanNaylor , do you have a tflite_runtime wheel file built with the bazel script for download maybe? My Pi seems to be to weak to build it and I don't understand how the cross-compile is supposed to work because the examples are broken 😕 (the referenced Docker files don't exist). I'd require the Python 3.9 aarch64 build 🙃.
[Edit] I did manage to compile it finally! Had to uninstall regular tensorflow and tflite packages though to use it 🤔 (some conflict it seems), but it is indeed even faster than "fat" tflite now 😀
Good as my native compile on Armv8,2 prob would not work on a Pi.
I am not really sure what the bazel build does that is different but yeah it is notibeabilly faster than the cmake compile that maybe hasn't been updated and creates a legacy build. I am thinking its the same for the minimal_build and maybe linking to bazel built lib than a static lib might also be faster due to whatever bazel does in the build as opposed to running cmake.
Shouldn't need docker though even though you got it working as you just specify https://www.tensorflow.org/lite/guide/build_cmake_pip#available_target_names
armhf
or aarch64
and it will go off and cross-compile, whilst otherwise for the host even if one of those native
Good as my native compile on Armv8,2 prob would not work on a Pi
Actually could you try my build: download (if you have the same Python version). I'd like to know if they are compatible and perform the same :-).
Ran in a miniconda env as on Ubuntu with Python 3.10, but yeah runs OK.
conda create -n tlite-test python=3.9
Totally forgot what we where at, but yeah it runs.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
gb1.wav
My fellow Americans, this day has brought terrible news and great sadness to our country. At 9 o clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors.
Inference took 2.1s
mm0.wav
This is the micro machine man presenting the most miniature motorcade of micro machine. Each one has dramatic details for a picture of precision paint jobs. Plus incredible micro machine pocket place that's physical police station fire station restaurant service station and more. Perfect pocket portable to take any place. And there are many miniature places to play with. Each one comes with its own special edition micro machine vehicle and fantastic features that miraculously move. Raise the bolt. Lift at the airport. Marine a man. The gun turret at the army base. Clean your car at the car. Raise the toll bridge. And these place that's fitted together to form a micro machine world. Micro machine pocket place that's so tremendously tiny so perfectly precise. So doesn't we detailed you on a pocket them all micro machines and micro machine pocket place that sold separately from Glube. The smaller they are, the better they are.
Inference took 3.72s
hp0.wav
Henry F. Phillips from Wikipedia, the free encyclopedia at en.wicopedia.org.
Inference took 1.5s
gb0.wav
Good morning. This Tuesday is Election Day. After months of spirited debate in vigorous campaigning, the time has come for Americans to make important decisions about our nation's future and encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats, and independents can find common ground on at least one point. Our system of
Inference took 2.52s
Nice 😎 , thanks for testing 👍
Running on OpenVoiceOS, RaspberryPi 4 - 2GB model. Using Python 3.10 and Tensorflow-lite 2.11
With the tiny model;