Closed lissyx closed 4 years ago
Results so far, on Google Pixel 2.
For NNAPI delegate:
lite_benchmark_model
shows a slow increase in average execution timeFor GPU OpenGL ES delegate:
ApplyGeneralTransformations
07-23 20:32:49.714 28449 28449 I tflite : Checking op 2/240: CUSTOM Mfcc !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 5/240: MINIMUM !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 7/240: MINIMUM !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 9/240: MINIMUM !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 11/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 12/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 13/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 14/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 15/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 16/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 17/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 18/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 19/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 20/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 21/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 22/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 23/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 24/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 25/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 26/240: STRIDED_SLICE !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 29/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 42/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.714 28449 28449 I tflite : Checking op 55/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.715 28449 28449 I tflite : Checking op 68/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.715 28449 28449 I tflite : Checking op 81/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.715 28449 28449 I tflite : Checking op 94/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.715 28449 28449 I tflite : Checking op 107/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.715 28449 28449 I tflite : Checking op 120/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.715 28449 28449 I tflite : Checking op 133/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.715 28449 28449 I tflite : Checking op 146/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.715 28449 28449 I tflite : Checking op 159/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.715 28449 28449 I tflite : Checking op 172/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.716 28449 28449 I tflite : Checking op 185/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.716 28449 28449 I tflite : Checking op 198/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.716 28449 28449 I tflite : Checking op 211/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.716 28449 28449 I tflite : Checking op 224/240: SPLIT !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
07-23 20:32:49.716 28449 28449 I tflite : Checking op 237/240: MINIMUM !status.ok() isAllFloatInputs=1 isAllFloatOutputs=1
So as expected, custom ops have no gpu delegate implementation, but minimum
and split
either. And it seems our use of strided slice
is not compatible.
Google Pixel 2 GPU delegation benchmark:
walleye:/data/local/tmp $ ./benchmark_model --graph=/sdcard/Android/data/com.mozilla.speechmodule/files/models/eng/output_graph.tflite --show_flops --input_layer=input_node,previous_state_c,previous_state_h --input_layer_type=float,float,float --input_layer_shape=1,16,19,26:1:1,2048:1,2048 --output_layer=logits,new_state_c,new_state_h --use_gpu=true
STARTING!
The number of items in --input_layer_shape (1,16,19,26:1:1,2048:1,2048, with 4 items) must match the number of items in --input_layer (input_node,previous_state_c,previous_state_h, with 3 items). For example --input_layer=input1,input2 --input_layer_shape=1,224,224,4:1,20
Min num runs: [50]
Min runs duration (seconds): [1]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Graph: [/sdcard/Android/data/com.mozilla.speechmodule/files/models/eng/output_graph.tflite]
Input layers: [input_node,previous_state_c,previous_state_h]
Input shapes: [1,16,19,26:1:1,2048:1,2048]
Use nnapi : [0]
Use legacy nnapi : [0]
Use gpu : [1]
Allow fp16 : [0]
Enable op profiling: [0]
Loaded model /sdcard/Android/data/com.mozilla.speechmodule/files/models/eng/output_graph.tflite
resolved reporter
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
ERROR: Next operations are not supported by GPU delegate:
CUSTOM AudioSpectrogram: Operation is not supported.
CUSTOM Mfcc: Operation is not supported.
SPLIT: Operation is not supported.
First 5 operations will run on the GPU, and the remaining 18 on the CPU.
INFO: Replacing 5 node(s) with delegate (TfLiteGpuDelegate) node.
Applied GPU delegate.
Initialized session in 21888ms
Running benchmark for at least 1 iterations and at least 0.5 seconds
count=13 first=81932 curr=30012 min=28637 max=81932 avg=39952.8 std=15000
Running benchmark for at least 50 iterations and at least 1 seconds
count=50 first=30008 curr=30004 min=29733 max=30206 avg=29980 std=104
Average inference timings in us: Warmup: 39952.8, Init: 21888050, no stats: 29980
walleye:/data/local/tmp $ ./benchmark_model --graph=/sdcard/Android/data/com.mozilla.speechmodule/files/models/eng/output_graph.tflite --show_flops --input_layer=input_node,previous_state_c,previous_state_h --input_layer_type=float,float,float --input_layer_shape=1,16,19,26:1:1,2048:1,2048 --output_layer=logits,new_state_c,new_state_h --use_gpu=false
STARTING!
The number of items in --input_layer_shape (1,16,19,26:1:1,2048:1,2048, with 4 items) must match the number of items in --input_layer (input_node,previous_state_c,previous_state_h, with 3 items). For example --input_layer=input1,input2 --input_layer_shape=1,224,224,4:1,20
Min num runs: [50]
Min runs duration (seconds): [1]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Graph: [/sdcard/Android/data/com.mozilla.speechmodule/files/models/eng/output_graph.tflite]
Input layers: [input_node,previous_state_c,previous_state_h]
Input shapes: [1,16,19,26:1:1,2048:1,2048]
Use nnapi : [0]
Use legacy nnapi : [0]
Use gpu : [0]
Allow fp16 : [0]
Enable op profiling: [0]
Loaded model /sdcard/Android/data/com.mozilla.speechmodule/files/models/eng/output_graph.tflite
resolved reporter
INFO: Initialized TensorFlow Lite runtime.
Initialized session in 5.098ms
Running benchmark for at least 1 iterations and at least 0.5 seconds
count=51 first=63770 curr=8275 min=8260 max=63770 avg=9872.57 std=7754
Running benchmark for at least 50 iterations and at least 1 seconds
count=119 first=8354 curr=8297 min=8251 max=11718 avg=8448.37 std=641
Average inference timings in us: Warmup: 9872.57, Init: 5098, no stats: 8448.37
walleye:/data/local/tmp $
There are some model changes to get rid of the unsupported StridedSlice
:
@@ -127,7 +127,8 @@ def rnn_impl_static_rnn(x, seq_length, previous_state, reuse):
name='cudnn_compatible_lstm_cell')
# Split rank N tensor into list of rank N-1 tensors
- x = [x[l] for l in range(x.shape[0])]
+ # x = [x[l] for l in range(x.shape[0])]
+ x = [tf.squeeze(x, axis=0)]
output, output_state = tfv1.nn.static_rnn(cell=fw_cell,
inputs=x,
@@ -136,7 +137,8 @@ def rnn_impl_static_rnn(x, seq_length, previous_state, reuse):
dtype=tf.float32,
scope='cell_0')
- output = tf.concat(output, 0)
+ # output = tf.concat(output, 0)
+ output = output[0]
return output, output_state
Some for removing Minimum
(obviously not a valid solution, just to get the model to run):
@@ -70,7 +70,7 @@ def dense(name, x, units, dropout_rate=None, relu=True):
output = tf.nn.bias_add(tf.matmul(x, weights), bias)
if relu:
- output = tf.minimum(tf.nn.relu(output), FLAGS.relu_clip)
+ output = tf.nn.relu(output)
if dropout_rate is not None:
output = tf.nn.dropout(output, rate=dropout_rate)
And some shape changes to avoid shader compilation, as well as moving the AudioSpectrogram
and Mfcc
nodes at the end, otherwise GPU delegation code will choke early on those and place no op on the GPU:
@@ -156,7 +158,7 @@ def create_model(batch_x, batch_size, seq_length, dropout, reuse=False, previous
# This is done to prepare the batch for input into the first layer which expects a tensor of rank `2`.
# Permute n_steps and batch_size
- batch_x = tf.transpose(batch_x, [1, 0, 2, 3])
+ #batch_x = tf.transpose(batch_x, [1, 0, 2, 3])
# Reshape to prepare input for first layer
batch_x = tf.reshape(batch_x, [-1, Config.n_input + 2*Config.n_input*Config.n_context]) # (n_steps*batch_size, n_input + 2*n_input*n_context)
layers['input_reshaped'] = batch_x
@@ -596,17 +598,12 @@ def test():
def create_inference_graph(batch_size=1, n_steps=16, tflite=False):
batch_size = batch_size if batch_size > 0 else None
- # Create feature computation graph
- input_samples = tfv1.placeholder(tf.float32, [Config.audio_window_samples], 'input_samples')
- samples = tf.expand_dims(input_samples, -1)
- mfccs, _ = samples_to_mfccs(samples, FLAGS.audio_sample_rate)
- mfccs = tf.identity(mfccs, name='mfccs')
-
# Input tensor will be of shape [batch_size, n_steps, 2*n_context+1, n_input]
# This shape is read by the native_client in DS_CreateModel to know the
# value of n_steps, n_context and n_input. Make sure you update the code
# there if this shape is changed.
input_tensor = tfv1.placeholder(tf.float32, [batch_size, n_steps if n_steps > 0 else None, 2 * Config.n_context + 1, Config.n_input], name='input_node')
+ input_tensor = tf.reshape(input_tensor, [-1, Config.n_input + 2*Config.n_input*Config.n_context])
seq_length = tfv1.placeholder(tf.int32, [batch_size], name='input_lengths')
if batch_size <= 0:
@@ -663,6 +660,12 @@ def create_inference_graph(batch_size=1, n_steps=16, tflite=False):
new_state_c = tf.identity(new_state_c, name='new_state_c')
new_state_h = tf.identity(new_state_h, name='new_state_h')
+ # Create feature computation graph
+ input_samples = tfv1.placeholder(tf.float32, [Config.audio_window_samples], 'input_samples')
+ samples = tf.expand_dims(input_samples, -1)
+ mfccs, _ = samples_to_mfccs(samples, FLAGS.audio_sample_rate)
+ mfccs = tf.identity(mfccs, name='mfccs')
+
inputs = {
'input': input_tensor,
'previous_state_c': previous_state_c,
SPLIT: Operation is not supported.
We still have that Op in the middle of the computation graph. As @reuben analyzed, this is coming from the LSTMCell. Likely if we figure out a way around, we could have all Ops on the GPU.
Average inference timings in us: Warmup: 27917.4, Init: 21528106, no stats: 23313.5
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
DELEGATE 0.000 21.227 21.287 91.329% 91.329% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/BiasAdd]
SPLIT 21.287 0.015 0.013 0.056% 91.385% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/split, cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/split:1, cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/split:2, cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/split:3]
ADD 21.300 0.021 0.022 0.093% 91.479% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/add]
LOGISTIC 21.322 0.012 0.014 0.059% 91.538% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Sigmoid]
MUL 21.336 0.005 0.007 0.029% 91.567% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/mul]
LOGISTIC 21.343 0.010 0.010 0.045% 91.611% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Sigmoid_1]
TANH 21.353 0.015 0.014 0.061% 91.672% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Tanh]
MUL 21.367 0.004 0.002 0.008% 91.680% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/mul_1]
ADD 21.369 0.008 0.003 0.014% 91.694% 0.000 1 [new_state_c]
LOGISTIC 21.373 0.012 0.010 0.045% 91.739% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Sigmoid_2]
TANH 21.383 0.013 0.012 0.053% 91.792% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Tanh_1]
MUL 21.396 0.002 0.002 0.009% 91.802% 0.000 1 [new_state_h]
FULLY_CONNECTED 21.398 1.114 1.178 5.053% 96.854% 0.000 1 [Relu_3]
FULLY_CONNECTED 22.576 0.036 0.034 0.145% 96.999% 0.000 1 [BiasAdd_4]
SOFTMAX 22.610 0.005 0.005 0.020% 97.019% 0.000 1 [logits]
RESHAPE 22.615 0.003 0.002 0.009% 97.028% 0.000 1 [ExpandDims]
AudioSpectrogram 22.617 0.257 0.245 1.050% 98.078% 0.000 1 [AudioSpectrogram]
Mfcc 22.862 0.389 0.447 1.917% 99.995% 0.000 1 [Mfcc]
RESHAPE 23.309 0.002 0.001 0.005% 100.000% 0.000 1 [mfccs]
============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
DELEGATE 0.000 21.227 21.287 91.329% 91.329% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/BiasAdd]
FULLY_CONNECTED 21.398 1.114 1.178 5.053% 96.382% 0.000 1 [Relu_3]
Mfcc 22.862 0.389 0.447 1.917% 98.299% 0.000 1 [Mfcc]
AudioSpectrogram 22.617 0.257 0.245 1.050% 99.349% 0.000 1 [AudioSpectrogram]
FULLY_CONNECTED 22.576 0.036 0.034 0.145% 99.494% 0.000 1 [BiasAdd_4]
ADD 21.300 0.021 0.022 0.093% 99.588% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/add]
TANH 21.353 0.015 0.014 0.061% 99.649% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Tanh]
LOGISTIC 21.322 0.012 0.014 0.059% 99.708% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Sigmoid]
SPLIT 21.287 0.015 0.013 0.056% 99.764% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/split, cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/split:1, cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/split:2, cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/split:3]
TANH 21.383 0.013 0.012 0.053% 99.817% 0.000 1 [cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Tanh_1]
Number of nodes executed: 19
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
DELEGATE 1 21.286 91.364% 91.364% 0.000 1
FULLY_CONNECTED 2 1.210 5.194% 96.558% 0.000 2
Mfcc 1 0.446 1.914% 98.472% 0.000 1
AudioSpectrogram 1 0.244 1.047% 99.519% 0.000 1
LOGISTIC 3 0.033 0.142% 99.661% 0.000 3
TANH 2 0.026 0.112% 99.773% 0.000 2
ADD 2 0.024 0.103% 99.876% 0.000 2
SPLIT 1 0.013 0.056% 99.931% 0.000 1
MUL 3 0.009 0.039% 99.970% 0.000 3
SOFTMAX 1 0.004 0.017% 99.987% 0.000 1
RESHAPE 2 0.003 0.013% 100.000% 0.000 2
Timings (microseconds): count=43 first=23150 curr=23055 min=22906 max=26011 avg=23307.6 std=486
Memory (bytes): count=0
19 nodes observed
@reuben Once we can get r2.2 this will get even more interesting since we could enable CoreML and Hexagon delegation. No idea of a potential speedup obviously, but I'm wondering how much of that should be exposed to the API ?
Was the CoreML delegate ever enabled? If so, are there benchmarks I can compare against?
Was the CoreML delegate ever enabled? If so, are there benchmarks I can compare against?
Unfortunately, no, we have no benchmark: as documented in the releases, we have setup the infra in the code to enable the use of delegates, but:
You should try and hack with https://github.com/mozilla/DeepSpeech/blob/cc038c1263352b6364ec0ba2e0e313a8cf21d279/native_client/tflitemodelstate.cc#L102-L156 to be able to get things running.
Also, as you can see on https://github.com/mozilla/tensorflow/tree/r2.3/tensorflow/lite/delegates there's Hexagon delegate, but I can't find a CoreML one anyway.
@zaptrem According to https://www.tensorflow.org/lite/performance/coreml_delegate it is now available as experimental from r2.4, but upgrading to that version still requires some work: https://github.com/mozilla/DeepSpeech/pull/3482
So this whole time I’ve been running inference only on iPhone 11’s performance CPU cores? It’s already at like 4X real-time (impressive). I’d love to start looking into this after we fix the iOS crashing.
In a perfect world (for my specific use case) I’d target 18X, which should be possible based on Apple’s claims of “15X faster ML performance.” Idk which instructions you guys use and whether they’re compatible with the Neural Engine, though.
So this whole time I’ve been running inference only on iPhone 11’s performance CPU cores? It’s already at like 4X real-time (impressive). I’d love to start looking into this after we fix the iOS crashing.
It's possible, I already get faster than realtime on Android on a QM215 chip :)
Idk which instructions you guys use and whether they’re compatible with the Neural Engine, though.
We really mostly depend on TensorFlow Lite, at that level.
Whoops, the word I was looking for was ops, not instructions. Were the custom ops and SPLIT removed as implied in the earlier comments on this issue? Or is that one of the items that wasn't completed in time?
Whoops, the word I was looking for was ops, not instructions. Were the custom ops and SPLIT removed as implied in the earlier comments on this issue? Or is that one of the items that wasn't completed in time?
Nah, I was hacking YOLO, like "ok, let's removing the offending ops not caring about the output: is it enough for runtime? what about perfs?". I have not had a look at the current status, maybe delegations has more ops now?
At first when we tested TFLite it was the same, and over time it's now good, so we can only hope.
More ops might have been added, but according to the docs you linked custom ops are still a no go.
On Fri, Feb 12, 2021 at 4:48 AM lissyx notifications@github.com wrote:
Whoops, the word I was looking for was ops, not instructions. Were the custom ops and SPLIT removed as implied in the earlier comments on this issue? Or is that one of the items that wasn't completed in time?
Nah, I was hacking YOLO, like "ok, let's removing the offending ops not caring about the output: is it enough for runtime? what about perfs?". I have not had a look at the current status, maybe delegations has more ops now?
At first when we tested TFLite it was the same, and over time it's now good, so we can only hope.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mozilla/DeepSpeech/issues/2270#issuecomment-778091397, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMJTRU7VN5ZMMPY6FMMMU3S6T2ODANCNFSM4IGMGE3Q .
Tentative patch: