Open joaosbastos opened 3 years ago
We don't have a coral edge tpu at our disposal to try, but after a quick look at the compatibility overview (https://coral.ai/docs/edgetpu/models-intro/#compatibility-overview) it seems that this should be possible in principle.
You can find a TF-Lite model of MiDaS here: https://github.com/intel-isl/MiDaS/releases/download/v2_1/model_opt.tflite. Perhaps you could try deploying this model and report back so that others could benefit from your insights?
I tried compiling with
edgetpu_compiler model_opt.tflite
and got
Edge TPU Compiler version 15.0.340273435
Invalid model: model_opt.tflite
Model not quantized
Coral suggests full-integer quantization in https://www.tensorflow.org/lite/performance/post_training_quantization?hl=nl I used the following script in python with the .pb model
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model('.')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_quant_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
Got the following
ValueError: NodeDef mentions attr 'output_shapes' not in Op<name=StatelessIf; signature=cond:Tcond, input: -> output:; attr=Tcond:type; attr=Tin:list(type),min=0; attr=Tout:list(type),min=0; attr=then_branch:func; attr=else_branch:func>; NodeDef: {{node cond}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.)
I'm still a beginner on tensorflow.
Googling suggests that this error is likely due to using a different TensorFlow version than the one that was used to build the model. Our model was tested with 2.3.0 (c.f. https://github.com/intel-isl/MiDaS/tree/master/tf).
Tried with 2.5.0 and that error is gone. Now it asks for a representative dataset as suggested on https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization_of_weights_and_activations
Full integer quantization
You can get further latency improvements, reductions in peak memory usage, and compatibility with integer only hardware devices or accelerators by making sure all model math is integer quantized.
For full integer quantization, you need to calibrate or estimate the range, i.e, (min, max) of all floating-point tensors in the model. Unlike constant tensors such as weights and biases, variable tensors such as model input, activations (outputs of intermediate layers) and model output cannot be calibrated unless we run a few inference cycles. As a result, the converter requires a representative dataset to calibrate them. This dataset can be a small subset (around ~100-500 samples) of the training or validation data. Refer to the representative_dataset() function below.
def representative_dataset():
for data in tf.data.Dataset.from_tensor_slices((images)).batch(1).take(100):
yield [tf.dtypes.cast(data, tf.float32)]
For testing purposes, you can use a dummy dataset as follows:
def representative_dataset():
for _ in range(100):
data = np.random.rand(1, 244, 244, 3)
yield [data.astype(np.float32)]
Where can i get such dataset?
@joaosbastos You can use any 100-500 RGB-images as representative dataset to calibrate the range of values for quantization. You can just try to use RGB-images from RedWeb-dataset: https://drive.google.com/file/d/12IjUC6eAiLBX67jW57YQMNRVqUGvTZkX/view Or better to use your custom real images on which this model will be used.
Made it to convert a model with your help with the following python script:
import tensorflow as tf
import cv2
import os
import numpy as np
def representative_data():
a = []
directory_rgb = r'ReDWeb_V1/Imgs'
directory_depth = r'ReDWeb_V1/RDs'
for filename in os.listdir(directory_rgb):
if filename.endswith(".jpg"):
print(os.path.join(directory_rgb, filename))
print(os.path.join(directory_depth, filename).split('.')[0]+'.png')
img = cv2.imread(os.path.join(directory_rgb, filename))
img = img / 255.0
img = img.astype('float32')
img = tf.image.resize(img, [256,256], method='bicubic', preserve_aspect_ratio=False)
img = tf.transpose(img, [2, 0, 1])
a.append(img)
else:
continue
a = np.array(a)
print(a.shape)
img = tf.data.Dataset.from_tensor_slices(a).batch(1)
for i in img.take(a.shape[0]):
print(i)
yield [i]
converter = tf.lite.TFLiteConverter.from_saved_model('.')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_quant_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_quant_model)
Then on the generated model i run the edgetpu_compiler and got this output:
edgetpu_compiler model.tflite
Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.
Model compiled successfully in 9208 ms.
Input model: model.tflite
Input size: 25.33MiB
Output model: model_edgetpu.tflite
Output size: 25.38MiB
On-chip memory used for caching model parameters: 80.00KiB
On-chip memory remaining for caching model parameters: 6.04MiB
Off-chip memory used for streaming uncached model parameters: 384.00KiB
Number of Edge TPU subgraphs: 1
Total number of operations: 17629
Operation log: model_edgetpu.log
Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 3
Number of operations that will run on CPU: 17626
See the operation log file for individual operation details.
Compilation child process completed within timeout period.
Compilation succeeded!
And the compiler log:
Edge TPU Compiler version 16.0.384591198
Input: model.tflite
Output: model_edgetpu.tflite
Operator Count Status
RESIZE_BILINEAR 5 More than one subgraph is not supported
SUB 1 Mapped to Edge TPU
TRANSPOSE 54 More than one subgraph is not supported
TRANSPOSE 126 Operation is otherwise supported, but not mapped due to some unspecified limitation
MINIMUM 48 More than one subgraph is not supported
CONCATENATION 24 More than one subgraph is not supported
CONV_2D 17001 More than one subgraph is not supported
MUL 1 Mapped to Edge TPU
MUL 72 More than one subgraph is not supported
SPLIT 97 More than one subgraph is not supported
ADD 99 More than one subgraph is not supported
RESHAPE 1 More than one subgraph is not supported
RELU 55 More than one subgraph is not supported
PAD 1 Mapped to Edge TPU
PAD 44 More than one subgraph is not supported
With so many operations running on CPU don't think it is worth it to run on this Edge. Is there a way to improve?
Thanks for sharing your results.
I'm a bit surprised that both CONV_2D and RELU seem to not be mapped to the TPU, as these are certainly supported in principle. Paging @AlexeyAB and @thias15 who might have some experience here?
@joaosbastos
What command did you use to compile?
Try to compile with -a
flag, like this edgetpu_compiler -sa model.tflite
There are 3 approaches/versions of EfficientNet:
EfficientNet GPU/TPU: Depth-Wise-Conv2d, SE, Swish https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/efficientnet_builder.py#L173
EfficientNet-Lite Mobile/TPU-edge: Depth-Wise-Conv2d, no-SE, ReLU6 https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/lite/efficientnet_lite_builder.py#L47
EfficientNet-TPU-edge: Conv2d, no-SE, ReLU https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/edgetpu/efficientnet_edgetpu_builder.py#L53
We used the 2nd approach and it should fit the TPU-edge based on this information: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite
these EfficientNet-lite models run well on all mobile CPU/GPU/EdgeTPU
Reviewing my process.
1) I downloaded the model from https://tfhub.dev/intel/midas/v2_1_small/1
2) Converted using the script above
3) Compiled using edgetpu_compiler -sa model.tflite
Got from the compiler
Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.
loc(fused["transpose_46", "transpose_1/perm"]): error: non-broadcastable operands
loc(fused["transpose_46", "transpose_1/perm"]): error: non-broadcastable operands
Compilation child process completed within timeout period.
Compilation failed!
Checking your comments, is the model from Tensorflow Hub valid for this?
On Tensorflow site https://coral.ai/docs/edgetpu/models-intro/#model-requirements
Model requirements
If you want to build a TensorFlow model that takes full advantage of the Edge TPU for accelerated inferencing, the model must meet these basic requirements:
Tensor parameters are quantized (8-bit fixed-point numbers; int8 or uint8).
Tensor sizes are constant at compile-time (no dynamic sizes).
Model parameters (such as bias tensors) are constant at compile-time.
Tensors are either 1-, 2-, or 3-dimensional. If a tensor has more than 3 dimensions, then only the 3 innermost dimensions may have a size greater than 1.
The model uses only the operations supported by the Edge TPU (see table 1 below).
Tensors are limited to 3 dimensions that may be the problem for not converting CONV_2D.
i tried with edgetpu_compiler -m 13 -sa model_opt_midas.tflite
have a result but not as expected:
Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.
Model compiled successfully in 246 ms.
Input model: model_opt_midas.tflite
Input size: 63.27MiB
Output model: model_opt_midas_edgetpu.tflite
Output size: 63.26MiB
On-chip memory used for caching model parameters: 0.00B
On-chip memory remaining for caching model parameters: 0.00B
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 0
Total number of operations: 136
Operation log: model_opt_midas_edgetpu.log
Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 0
Number of operations that will run on CPU: 136
Operator Count Status
RESIZE_BILINEAR 5 Operation is working on an unsupported data type
RELU 7 Operation is working on an unsupported data type
CONV_2D 73 Operation is working on an unsupported data type
DEPTHWISE_CONV_2D 24 Operation is working on an unsupported data type
ADD 27 Operation is working on an unsupported data type
Compilation child process completed within timeout period.
Compilation succeeded!```
Hello,
is it possible to use midas_small_v2_1 model on coral edge tpu? Is there a way to convert it?
Best regards.