Open j1nx opened 1 year ago
And with the two files from; https://github.com/fquirin/speech-recognition-experiments/tree/main/test-files
Loading audio file: samples/en_sh_lights_70pct_4s.wav
Samplerate: 16000, length: 3.575875s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
Set the lights in the living room to 70%.
Inference took 3.5s for 3.58s audio file.
Loading audio file: samples/en_speech_jfk_11s.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Inference took 4.26s for 11.0s audio file.
https://github.com/ggerganov/whisper.cpp/issues/7#issuecomment-1397467197
Thinking an arg might be better as with big.Little often its better just to use the big as have found its sometimes faster than all cores.
Also being trying to work out how to get floating point times so we get fractions of a second We have end.tv_sec-start.tv_sec Whats the best way of adding tv_usec and is there just time float alternative?
orangepi@orangepi5:~/openai-whisper/minimal_build$ ./minimal ../models/whisper.tflite ../samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 1 seconds
[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
ps
orangepi@orangepi5:~/openai-whisper/minimal_build$ ./minimal ../models/whisper-small.tflite ../samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds
terminate called after throwing an instance of 'std::out_of_range'
what(): map::at
Aborted
Guess that is something to do with the tokeniser which is out of timings
Am I correct, it only decodes one pass of 30 seconds? I can't seem to get the full transcribe of >30 seconds wav files. It just continues to the next wav file with the test.py file.
orangepi@orangepi5:~/openai-whisper/minimal_build$ ./minimal ../models/whisper.tflite ../samples/test.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 2 seconds
[_SOT_][_NOT_] Bili always listens to his mother. He always does what she says. If his mother says, brush your teeth, Bili brushes his teeth. If his mother says, go to bed, Bili goes to bed. Bili is a very good boy, a good boy listens to his mother. His mother does not have to ask him again. She asks him to do something one time and she does not ask again. Bili is a good boy. He does what his mother asks the first time. She does not have to ask again.
Seems something to do with the small model as ok with tiny
Loading audio file: samples/A_J_Cook_Speech_from_Lansbury's_Labour_Weekly.wav
Samplerate: 16000, length: 188.231125s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
The field of the workers have put a million miners with their wives and children something like one tends to the whole population of this country have long called a loud progester. If you were to believe all the things that capitalist press pay about us, you would think that we were the most terrible people on earth. They tell you that we are never satisfied. That we are always psychic, that we are never content for our wages, with our hours, or with the hoses we live in. And yet,
Inference took 9.12s for 1.88e+02s audio file.
Looks like indeed only one pass of 30 seconds is transcribed after it loads the next wav file.
Yeah I am talking about the above
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds
terminate called after throwing an instance of 'std::out_of_range'
what(): map::at
Aborted
Happens with the small model but fine with tiny but yeah we only get the 1st 30sec beamsearch
orangepi@orangepi5:~/openai-whisper/minimal_build$ ./minimal ../models/whisper.tflite ../samples/gb0.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 2 seconds
[_SOT_][_NOT_] Good morning. This Tuesday is Election Day. After months of spirited debate in vigorous campaigning, the time has come for Americans to make important decisions about our nation's future and encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats, and independents can find common ground on at least one point. Our system of
wget --quiet --show-progress -O samples/gb0.ogg https://upload.wikimedia.org/wikipedia/commons/2/22/George_W._Bush%27s_weekly_radio_address_%28November_1%2C_2008%29.oga
ffmpeg -loglevel -0 -y -i samples/gb0.ogg -ar 16000 -ac 1 -c:a pcm_s16le samples/gb0.wav
Yeah, posted that error over at the now closed issue at whisper.cpp
the small model is not yet correct.
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper.tflite samples/gb0.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 9 seconds
[_SOT_][_NOT_] Good morning. This Tuesday is Election Day. After months of spirited debate in vigorous campaigning, the time has come for Americans to make important decisions about our nation's future and encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats, and independents can find common ground on at least one point. Our system of
And with Python
Loading audio file: samples/gb0.wav
Samplerate: 16000, length: 127.36s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
Good morning. This Tuesday is Election Day. After months of spirited debate in vigorous campaigning, the time has come for Americans to make important decisions about our nation's future and encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats, and independents can find common ground on at least one point. Our system of
Inference took 8.75s for 1.27e+02s audio file.
I ran test.py and it worked fine with whisper-small.tflite, maybe on raspberry pi tflite version is bit older one
$python test.py
Importing tensorflow and numpy
2023-01-19 12:00:17.805994: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
Importing whisper
Loading tflite model models/whisper.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Loading audio file: ../test-files/en_sh_lights_70pct_4s.wav Samplerate: 16000, length: 3.575875s Calculating mel spectrogram... Invoking interpreter ... Preparing output data ... Converting tokens ... !!!
Inference took 9.42s for 3.58s audio file.
Loading audio file: ../test-files/en_speech_jfk_11s.wav Samplerate: 16000, length: 11.0s Calculating mel spectrogram... Invoking interpreter ... Preparing output data ... Converting tokens ... !! And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
@j1nx pls downloasd the latest whisper-small.tflite and I will also run using minimal c++ example
@nyadla-sys Will do in a bit.
That test.py you linked to uses full tensorflow, however we use the tensorflow-lite. https://github.com/fquirin/speech-recognition-experiments/blob/main/whisper-tflite/test.py#L9
Could you flip the # at line 8 and 9 and try again?
(PS, I run TFlite 2.11, however without any custom ops. Perhaps that is what we need)
PS the guys at tensorflow took pity on me :)
https://github.com/tensorflow/tensorflow/issues/59273#issuecomment-1384441333
@nyadla-sys Ran with the latest small model
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 3435 (WHILE) failed to invoke.
Error at ../minimal.cc:211
Same error
I think it is because of the Custom OPS ( oneDNN custom operations ) are used as I saw in your output.
Could you flip the # at line 8 and 9 and try again?
If you do that don't forget to change line 24 to interpreter = tf.Interpreter(model_path, num_threads=int(args.threads))
as well
PS the guys at tensorflow took pity on me :)
Did you notice any difference? I need to check what they've actually changed since they rewrote a lot without comments
Yeah, indeed. Grabbed the snippet here which has that also flipped; https://github.com/ggerganov/whisper.cpp/issues/7#issuecomment-1384419135
Anyhow, could you or @nyadla-sys check out both? As we do not train, all we need is the tflite runtime for inference.
I just ran whisper-small.tflite on my linux ubuntu and please refer the latest README.md
$ ./minimal ../models/whisper-small.tflite ../samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Inference time 26 seconds
[_extra_token_50258][_extra_token_50259]!! And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
[_extra_token_50258][_extra_token_50259] are tokens output stays the same
are tokens output stays the same
tokens output gets changed depending on the english/multilingual model
To use a multilingual model in Python, you can simply change the line "wtokenizer = whisper.tokenizer.get_tokenizer(False, language="en")" to "wtokenizer = whisper.tokenizer.get_tokenizer(True, language="en")"
Continuation from here
Does this require the whisper-small.tflite
model? Because I've tried that with whisper.tflite
(en only?) but the output is completely scrambled and still English when I set for example "de".
To use a multilingual model in Python, you can simply change the line "wtokenizer = whisper.tokenizer.get_tokenizer(False, language="en")" to "wtokenizer = whisper.tokenizer.get_tokenizer(True, language="en")"
Continuation from here
Does this require the
whisper-small.tflite
model? Because I've tried that withwhisper.tflite
(en only?) but the output is completely scrambled and still English when I set for example "de".
Please use whisper-small.tflite or whisper-medium.tflite
These two whisper-small.tflite or whisper-medium.tflite models are multilingual, "whisper.tflite" is identical to "whisper-tiny.en.tflite", I intended to change the name but I refrained from doing so because many people are using in their examples.
Strangs as I run those test with the right vocab multilang bin.
will double check in the morning.
I've updated test.py with ne parameters for "--lang", "--runtime" and the tweaks mentioned in the tensorflow issue:
$ python3 test.py -h
usage: test.py [-h] [-f FOLDER] [-m MODEL] [-t THREADS] [-l LANG] [-r RUNTIME]
Running Whisper TFlite test inference.
optional arguments:
-h, --help show this help message and exit
-f FOLDER, --folder FOLDER
Folder with WAV input files
-m MODEL, --model MODEL
Path to model
-t THREADS, --threads THREADS
Threads used
-l LANG, --lang LANG Language used
-r RUNTIME, --runtime RUNTIME
Tensorflow runtime, use '1' for tf.lite or '2' for tflite_runtime
On my Rpi400 tflite_runtime
is still about 1.5s slower than tf.lite
.
I could not test whisper-small.tflite
because it keeps crashing (will post error in a minute).
Change the name @nyadla-sys as we can change things but think we are all used the the original naming convention
@fquirin dunno why but the tensorflow rewrote the test script for me, maybe its because the full tf has some special sauce optimisation with cmake for Arm that maybe we are missing with the TensorFlow Lite Python Wheel Package build?
https://github.com/tensorflow/tensorflow/issues/59273#issuecomment-1384441333
Change the name @nyadla-sys as we can change things but think we are all used the the original naming convention
I have updated the model to "whisper-tiny.en.tflite" and I need to update the README. For backward compatibility, I will keep "whisper.tflite" for a while.
Error with whisper-small.tflite
(I think we had this somewhere a few hours ago already?):
Traceback (most recent call last):
File "/home/pi/whisper-tflite/openai-whisper/test.py", line 93, in <module>
transcribe(args.folder + file)
File "/home/pi/whisper-tflite/openai-whisper/test.py", line 68, in transcribe
interpreter.invoke()
File "/home/pi/whisper-tflite/venv/lib/python3.9/site-packages/tensorflow/lite/python/interpreter.py", line 917, in invoke
self._interpreter.Invoke()
RuntimeError: gather index out of boundsNode number 35 (GATHER) failed to invoke.Node number 3435 (WHILE) failed to invoke.
dunno why but the tensorflow rewrote the test script for me, maybe its because the full tf has some special sauce optimisation with cmake for Arm that maybe we are missing with the TensorFlow Lite Python Wheel Package build?
It must be something like that, yes 🤔
Error with
whisper-small.tflite
(I think we had this somewhere a few hours ago already?):Traceback (most recent call last): File "/home/pi/whisper-tflite/openai-whisper/test.py", line 93, in <module> transcribe(args.folder + file) File "/home/pi/whisper-tflite/openai-whisper/test.py", line 68, in transcribe interpreter.invoke() File "/home/pi/whisper-tflite/venv/lib/python3.9/site-packages/tensorflow/lite/python/interpreter.py", line 917, in invoke self._interpreter.Invoke() RuntimeError: gather index out of boundsNode number 35 (GATHER) failed to invoke.Node number 3435 (WHILE) failed to invoke.
dunno why but the tensorflow rewrote the test script for me, maybe its because the full tf has some special sauce optimisation with cmake for Arm that maybe we are missing with the TensorFlow Lite Python Wheel Package build?
It must be something like that, yes thinking
Can you provide me with additional information such as the operating system, machine, and whether you are using a minimal C++ build or a Python script?
Can you provide me with additional information such as the operating system, machine, and whether you are using a minimal C++ build or a Python script?
Sure: Aarch64, Raspberry Pi 400, 4GB RAM, Debian Bullseye (11), Python script.
Maybe my Pi is actually out of memory when using the small model but according to OpenAI 2GB should be fine 🤔
Can you provide me with additional information such as the operating system, machine, and whether you are using a minimal C++ build or a Python script?
Sure: Aarch64, Raspberry Pi 400, 4GB RAM, Debian Bullseye (11), Python script.
Maybe my Pi is actually out of memory when using the small model but according to OpenAI 2GB should be fine 🤔
Run 'htop' in another shell and you see what is going on.
Some recent benchmark results with my Rpi400:
Whisper TFlite - tiny-en - tensorflow.lite - 4 threads:
-------------------------------------------------------
Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Inference took 4.083s for 11.000s audio file.
Whisper TFlite - tiny-en - tflite_runtime - 4 threads:
------------------------------------------------------
Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Inference took 5.668s for 11.000s audio file.
Run 'htop' in another shell and you see what is going on.
Runs at 780MB (used) for a while than quickly maxes out at 1.8GB (~50%) then crashes.
What I did notice during the test: tensorflow.lite
seems to use the 4 cores more efficiently than tflite_runtime
🤔
[EDIT] Screenshot (tflite_runtime: TOP, tensorflow.lite: BOTTOM):
Can you provide me with additional information such as the operating system, machine, and whether you are using a minimal C++ build or a Python script?
I run C++ minimal and python build on below and both works fine Linux pop-os 5.19.0-76051900-generic #202207312230~1663791054~22.04~28340d4 SMP PREEMPT_DYNAMIC Wed S x86_64 x86_64 x86_64 GNU/Linux
I run C++ minimal and python build on below and both works fine
I've tested it on my x86 Debian 11 laptop and it worked as well (Python test.py). So it seems to be a ARM or Raspberry Pi issue 🤔.
Language selection still doesn't work though. The small model with "de" setting adds "!!!" to the beginning of a line, removes some words or entire texts but never gives any results in German.
What does; "-funsafe-math-optimizations" do exactly?
Because all Tensorflow Lite documentation shows, it should be used and so did I. https://www.tensorflow.org/lite/guide/build_cmake_arm
However looking at the build script used within the repo, I can't find it (anymore); https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/pip_package/build_pip_package_with_cmake.sh
It is also not present using Bazel; https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/pip_package/build_pip_package_with_bazel.sh
However the bazel based buildscript does explicitely set "-O3" which is not done with cmake. (I also do not set it explicitely and compile OpenVoiceOS-buildroot with "-O2".
Perhaps the ~25% comes from there?
Just double checked if I had used the right vocab multilingual filter bin used, but indeed I did. It has something to do with the model and the gather function being different between the tflite_runtime lib and the tensorflow lite lib.
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-small.tflite -t 4 -l en -r 2 Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-small.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Traceback (most recent call last):
File "/home/mycroft/whisper/test.py", line 93, in <module>
transcribe(args.folder + file)
File "/home/mycroft/whisper/test.py", line 68, in transcribe
interpreter.invoke()
File "/usr/lib/python3.10/site-packages/tflite_runtime/interpreter.py", line 917, in invoke
self._interpreter.Invoke()
RuntimeError: gather index out of boundsNode number 35 (GATHER) failed to invoke.Node number 3435 (WHILE) failed to invoke.
And with the C++ version
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite samples/jfk.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 3435 (WHILE) failed to invoke.
Error at ../minimal.cc:211
funsafe-math-optimizations
A quirk of Neon in Armv7 devices is that it flushes all subnormal numbers to zero, and as a result the GCC compiler will not use it unless -funsafe-math-optimizations, which allows losing denormals, is turned on. "Enhanced" Neon defined since Armv8 does not have this quirk, but as of GCC 8.2 the same flag is still required to enable Neon instructions.[133] On the other hand, GCC does consider Neon safe on AArch64 for Armv8.
I managed to generate encoder and decoder tflite models and just pending to complete the decoder post processing to generate tokens with text https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/whisper_encoder_decoder_tflite.ipynb
I rebuild the tflite_runtime with GPU support.
Running the python based inference now takes longer, while it still says it is using XNNPACK
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.
Inference took 6.56s for 11.00s audio file.
There is no GPU GL support on linux, only for Android. It is just in preparation of that Vulkan clvk thingy I would like to test.
BTW @nyadla-sys You are converting the small model with SELECT_OPS wich is not available for the tflite_intepreter. Perhaps that is the reason why we can't run it while you can.
@StuartIanNaylor Am in process of crosscompiling ComputeLibrary and ArmNN for rpi4 (armv8a). Interested to see if ArmNN outperforms XNNPACK. They claim it does, so interested to see...
I successfully executed the TFLite encoder and decoder models,it open doors to run on two different processors and it supports multilingual along with translate feature.
this will open up multilingual support for whisper tflite models
@StuartIanNaylor Am in process of crosscompiling ComputeLibrary and ArmNN for rpi4 (armv8a). Interested to see if ArmNN outperforms XNNPACK. They claim it does, so interested to see...
I have had some problems with ArmNN in that it seems very platform dependent and you might find if you use Ubuntu 22.04 for Pi it might work whilst you could have problems otherwise but see how you go. Its whatever version they use in the Wav2Letter example.
@nyadla-sys This is interesting as those who can run on GPU/CPU/NPU what benchmarks can be provided, I haven't looked or tried yet but will.
@StuartIanNaylor Am in process of crosscompiling ComputeLibrary and ArmNN for rpi4 (armv8a). Interested to see if ArmNN outperforms XNNPACK. They claim it does, so interested to see...
I have had some problems with ArmNN in that it seems very platform dependent and you might find if you use Ubuntu 22.04 for Pi it might work whilst you could have problems otherwise but see how you go. Its whatever version they use in the Wav2Letter example.
That is most likely because of this; https://github.com/ARM-software/ComputeLibrary/blob/main/SConstruct#L93
For the OpenVoiceOS project everything gets compiled from source optimized for the specific board (for now rpi only but other might follow).
It defaults to armv7a, while for your board you could better use one of the arm64-v8* architectures.
@j1nx Have you managed to build it? https://review.mlplatform.org/plugins/gitiles/ml/armnn/+/747b9c6748802f862a86c85e43ba028b64ac809a/delegate/BuildGuideNative.md
I am still playing with a delegate build and have in minimal.cc
/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include "tensorflow/lite/core/interpreter.h"
#include "tensorflow/lite/kernels/register.h"
#include "tensorflow/lite/model.h"
#include "tensorflow/lite/optional_debug_tools.h"
#include "whisper.h"
#include "input_features.h"
// This is an example that is minimal to read a model
// from disk and perform inference. There is no data being loaded
// that is up to you to add as a user.
//
// NOTE: Do not add any dependencies to this that cannot be built with
// the minimal makefile. This example must remain trivial to build with
// the minimal build tool.
//
// Usage: minimal <tflite model>
#define TFLITE_MINIMAL_CHECK(x) \
if (!(x)) { \
fprintf(stderr, "Error at %s:%d\n", __FILE__, __LINE__); \
exit(1); \
}
int main(int argc, char* argv[]) {
if ((argc != 2) && (argc != 3)) {
fprintf(stderr, "'minimal <tflite model>' or 'minimal <tflite model> <pcm_file name>'\n");
return 1;
}
const char* filename = argv[1];
whisper_filters filters;
whisper_mel mel;
struct timeval start_time,end_time;
std::string word;
int32_t n_vocab = 0;
std::string fname = "./filters_vocab_gen.bin";
auto fin = std::ifstream(fname, std::ios::binary);
{
uint32_t magic=0;
fin.read((char *) &magic, sizeof(magic));
//@magic:USEN
if (magic != 0x5553454e) {
printf("%s: invalid vocab file '%s' (bad magic)\n", __func__, fname.c_str());
return 0;
}
}
// load mel filters
{
fin.read((char *) &filters.n_mel, sizeof(filters.n_mel));
fin.read((char *) &filters.n_fft, sizeof(filters.n_fft));
filters.data.resize(filters.n_mel * filters.n_fft);
fin.read((char *) filters.data.data(), filters.data.size() * sizeof(float));
}
// load vocab
{
fin.read((char *) &n_vocab, sizeof(n_vocab));
g_vocab.n_vocab = n_vocab;
printf("\nn_vocab:%d\n",(int)n_vocab);
for (int i = 0; i < n_vocab; i++) {
uint32_t len;
fin.read((char *) &len, sizeof(len));
word.resize(len);
fin.read((char *) word.data(), len);
g_vocab.id_to_token[i] = word;
//printf("len:%d",(int)len);
//printf("'%s'\n", g_vocab.id_to_token[i].c_str());
}
g_vocab.n_vocab = 51864;//add additional vocab ids
if (g_vocab.is_multilingual()) {
g_vocab.token_eot++;
g_vocab.token_sot++;
g_vocab.token_prev++;
g_vocab.token_solm++;
g_vocab.token_not++;
g_vocab.token_beg++;
}
for (int i = n_vocab; i < g_vocab.n_vocab; i++) {
if (i > g_vocab.token_beg) {
word = "[_TT_" + std::to_string(i - g_vocab.token_beg) + "]";
} else if (i == g_vocab.token_eot) {
word = "[_EOT_]";
} else if (i == g_vocab.token_sot) {
word = "[_SOT_]";
} else if (i == g_vocab.token_prev) {
word = "[_PREV_]";
} else if (i == g_vocab.token_not) {
word = "[_NOT_]";
} else if (i == g_vocab.token_beg) {
word = "[_BEG_]";
} else {
word = "[_extra_token_" + std::to_string(i) + "]";
}
g_vocab.id_to_token[i] = word;
// printf("%s: g_vocab[%d] = '%s'\n", __func__, i, word.c_str());
}
}
//Generate input_features for Audio file
if (argc == 3) {
const char* pcmfilename = argv[2];
// WAV input
std::vector<float> pcmf32;
{
drwav wav;
if (!drwav_init_file(&wav, pcmfilename, NULL)) {
fprintf(stderr, "%s: failed to open WAV file '%s' - check your input\n", argv[0],pcmfilename);
// whisper_print_usage(argc, argv, {});
return 3;
}
if (wav.channels != 1 && wav.channels != 2) {
fprintf(stderr, "%s: WAV file '%s' must be mono or stereo\n", argv[0], pcmfilename);
return 4;
}
if (wav.sampleRate != WHISPER_SAMPLE_RATE) {
fprintf(stderr, "%s: WAV file '%s' must be 16 kHz\n", argv[0], pcmfilename);
return 5;
}
if (wav.bitsPerSample != 16) {
fprintf(stderr, "%s: WAV file '%s' must be 16-bit\n", argv[0], pcmfilename);
return 6;
}
int n = wav.totalPCMFrameCount;
std::vector<int16_t> pcm16;
pcm16.resize(n*wav.channels);
drwav_read_pcm_frames_s16(&wav, n, pcm16.data());
drwav_uninit(&wav);
// convert to mono, float
pcmf32.resize(n);
if (wav.channels == 1) {
for (int i = 0; i < n; i++) {
pcmf32[i] = float(pcm16[i])/32768.0f;
}
} else {
for (int i = 0; i < n; i++) {
pcmf32[i] = float(pcm16[2*i] + pcm16[2*i + 1])/65536.0f;
}
}
}
//Hack if the audio file size is less than 30ms append with 0's
pcmf32.resize((WHISPER_SAMPLE_RATE*WHISPER_CHUNK_SIZE),0);
if (!log_mel_spectrogram(pcmf32.data(), pcmf32.size(), WHISPER_SAMPLE_RATE, WHISPER_N_FFT, WHISPER_HOP_LENGTH, WHISPER_N_MEL, 1,filters, mel)) {
fprintf(stderr, "%s: failed to compute mel spectrogram\n", __func__);
return -1;
}
printf("\nmel.n_len%d\n",mel.n_len);
printf("\nmel.n_mel:%d\n",mel.n_mel);
}//end of audio file processing
// Load tflite model
std::unique_ptr<tflite::FlatBufferModel> model =
tflite::FlatBufferModel::BuildFromFile(filename);
TFLITE_MINIMAL_CHECK(model != nullptr);
// Build the interpreter with the InterpreterBuilder.
// Note: all Interpreters should be built with the InterpreterBuilder,
// which allocates memory for the Interpreter and does various set up
// tasks so that the Interpreter can read the provided model.
tflite::ops::builtin::BuiltinOpResolver resolver;
tflite::InterpreterBuilder builder(*model, resolver);
std::unique_ptr<tflite::Interpreter> interpreter;
builder(&interpreter);
TFLITE_MINIMAL_CHECK(interpreter != nullptr);
// Allocate tensor buffers.
TFLITE_MINIMAL_CHECK(interpreter->SetNumThreads(4) == kTfLiteOk);
TFLITE_MINIMAL_CHECK(interpreter->AllocateTensors() == kTfLiteOk);
//printf("=== Pre-invoke Interpreter State ===\n");
// tflite::PrintInterpreterState(interpreter.get());
// Get information about the memory area to use for the model's input.
float* input = interpreter->typed_input_tensor<float>(0);
if (argc == 2) {
memcpy(input, _content_input_features_bin, WHISPER_N_MEL*WHISPER_MEL_LEN*sizeof(float)); //to load pre generated input_features
}
else if (argc == 3) {
memcpy(input, mel.data.data(), mel.n_mel*mel.n_len*sizeof(float));
}
// Fill input buffers
// TODO(user): Insert code to fill input tensors.
// Note: The buffer of the input tensor with index `i` of type T can
// be accessed with `T* input = interpreter->typed_input_tensor<T>(i);`
gettimeofday(&start_time, NULL);
// Run inference
TFLITE_MINIMAL_CHECK(interpreter->Invoke() == kTfLiteOk);
gettimeofday(&end_time, NULL);
if (end_time.tv_usec-start_time.tv_usec <= 0) {
printf("Inference time %ld.%ld seconds \n",(end_time.tv_sec-start_time.tv_sec-1),(end_time.tv_usec-start_time.tv_usec+1000000));
}
else if (end_time.tv_usec-start_time.tv_usec >= 0) {
printf("Inference time %ld.%ld seconds \n",(end_time.tv_sec-start_time.tv_sec),(end_time.tv_usec-start_time.tv_usec));
}
int output = interpreter->outputs()[0];
TfLiteTensor *output_tensor = interpreter->tensor(output);
TfLiteIntArray *output_dims = output_tensor->dims;
// assume output dims to be something like (1, 1, ... ,size)
auto output_size = output_dims->data[output_dims->size - 1];
//printf("output size:%d\n",output_size );
int *output_int = interpreter->typed_output_tensor<int>(0);
std::string text = "";
std::string word_add;
for (int i = 0; i < output_size; i++) {
//printf("%d\t",output_int[i]);
if(output_int[i] == g_vocab.token_eot){
break;
}
text += whisper_token_to_str(output_int[i]);
}
printf("\n%s\n", text.c_str());
printf("\n");
//printf("\n\n=== Post-invoke Interpreter State ===\n");
//// tflite::PrintInterpreterState(interpreter.get());
// Read output buffers
// TODO(user): Insert getting data out code.
// Note: The buffer of the output tensor with index `i` of type T can
// be accessed with `T* output = interpreter->typed_output_tensor<T>(i);`
return 0;
}
If I build with cmake --build ../tensorflow_src/tensorflow/lite/examples/minimal -DTFLITE_ENABLE_XNNPACK=OFF
orangepi@orangepi5:~/openai-whisper/minimal_build$ ./minimal ../models/whisper.tflite ../samples/test_1.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
Inference time 2.469990 seconds
[_SOT_][_NOT_] David lost his yellow pencil. He could not find it. Where is my yellow pencil? Yes his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?
Then build with cmake ../tensorflow_src/tensorflow/lite/examples/minimal
orangepi@orangepi5:~/openai-whisper/minimal_build$ ./minimal ../models/whisper.tflite ../samples/test_1.wav
n_vocab:50257
mel.n_len3000
mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 2.490417 seconds
[_SOT_][_NOT_] David lost his yellow pencil. He could not find it. Where is my yellow pencil? Yes his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?
Is that the speedup XNNPACK gives !?
Still working on it as I am cross compiling it into buildroot as package for it.
solving error by error as usual.
so can't really comment on your performance numbers.
have you tried the python way with both normal and the external delegates? Interested to see those numbers.
an apple iphone11 is only taking 0.7 seconds to run inference with whisper-tiny.en.tflite
Prob the Amx blocks are the same sort of secret sauce that the M1 has
The Apple A13 Bionic features an Apple-designed 64-bit six-core CPU implementing ARMv8.4-A[1] ISA, with two high-performance cores running at 2.65 GHz[6] called Lightning and four energy-efficient cores called Thunder. The Lightning cores feature machine learning accelerators called AMX blocks
https://medium.com/swlh/apples-m1-secret-coprocessor-6599492fc1e1 Apple sauce.
Prob the Amx blocks are the same sort of secret sauce that the M1 has
The Apple A13 Bionic features an Apple-designed 64-bit six-core CPU implementing ARMv8.4-A[1] ISA, with two high-performance cores running at 2.65 GHz[6] called Lightning and four energy-efficient cores called Thunder. The Lightning cores feature machine learning accelerators called AMX blocks
https://medium.com/swlh/apples-m1-secret-coprocessor-6599492fc1e1 Apple sauce.
Jup, now I know for sure; I am a software guy.🤣
Running on OpenVoiceOS, RaspberryPi 4 - 2GB model. Using Python 3.10 and Tensorflow-lite 2.11
With the tiny model;