sambhusuryamohan commented 4 years ago

Prerequisites

Please answer the following question for yourself before submitting an issue.

[ ] I checked to make sure that this feature has not been requested already.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

2. Describe the feature you request

In the previous object detection API there was a script to export the model for tflite inference, which is not present for the keras models in the current version. Would be good to have that feature with options to add or remove the postprocessing operation.

3. Additional context

The previous export tool gave the flexibility to include or exclude the post-processing part of SSD-Mobilenetv2. We are using that script for our algorithms. Sadly I do not find it in the current version. I tried with the available export_tflite_ssd_graph.py but it was not working. The exporter_main_v2.py exports it to saved_model, which I can load using tensorflow.saved_model.load() but I am not sure how to slice off layers(Previously I was slicing from concat layer)

4. Are you willing to contribute it? (Yes or No)

No

GPhilo commented 4 years ago

Adding to this, trying to convert the saved_model via TFLiteConverter fails for TF2-only models (e.g., EfficientDet_D0). For example, trying to convert the saved_model from the pretrained model:

import tensorflow as tf

saved_model_obj = tf.saved_model.load(export_dir='pre-tranied-models/efficientdet_d0_coco17_tpu-32/saved_model')
print(saved_model_obj.signatures.keys())

concrete_func = saved_model_obj.signatures['serving_default']
concrete_func.inputs[0].set_shape([1, 512, 512, 3])

converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.experimental_new_converter = True
#converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
tflite_model = converter.convert()

open("efficientdet.tflite", "wb").write(tflite_model)

raises the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/slr/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 518, in convert
    **converter_kwargs)
  File "/home/slr/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow/lite/python/convert.py", line 496, in toco_convert_impl
    enable_mlir_converter=enable_mlir_converter)
  File "/home/slr/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow/lite/python/convert.py", line 227, in toco_convert_protos
    raise ConverterError("See console for info.\n%s\n%s\n" % (stdout, stderr))
tensorflow.lite.python.convert.ConverterError: See console for info.
2020-07-20 12:25:11.542020: W tensorflow/compiler/mlir/lite/python/graphdef_to_tfl_flatbuffer.cc:144] Ignored output_format.
2020-07-20 12:25:11.542052: W tensorflow/compiler/mlir/lite/python/graphdef_to_tfl_flatbuffer.cc:147] Ignored drop_control_dependency.
loc("Func/StatefulPartitionedCall/input/_1"): error: requires all operands to be either same as or ref type of results
Traceback (most recent call last):
  File "/home/slr/anaconda3/envs/tensorflow/bin/toco_from_protos", line 10, in <module>
    sys.exit(main())
  File "/home/slr/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow/lite/toco/python/toco_from_protos.py", line 93, in main
    app.run(main=execute, argv=[sys.argv[0]] + unparsed)
  File "/home/slr/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/slr/anaconda3/envs/tensorflow/lib/python3.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/slr/anaconda3/envs/tensorflow/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/home/slr/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow/lite/toco/python/toco_from_protos.py", line 56, in execute
    enable_mlir_converter)
Exception: <unknown>:0: error: loc("Func/StatefulPartitionedCall/input/_1"): requires all operands to be either same as or ref type of results

(Tested with TF 2.2 on the EfficientDet D0 model available on the TF2 object detection API model zoo) Having a V1-like tool or a notebook example of how to do this would be great.

tzekid commented 4 years ago

@GPhilo tried conversion on tf-nightly (what I currently use for my project), not from concrete function but saved_model. Here's my code snippet:

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model('efficientdet_d0_coco17_tpu-32/saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()

open("efficientdet.tflite", "wb").write(tflite_model)

Don't know enough about your method, so a tutorial from the devs / community would be nice. Until then you could technically load the model (in tf) change / set the input shape, build and save the model again :man_shrugging:

GPhilo commented 4 years ago

@tzekid The snippet I used was one I found in a related issue (I'll to link to it, if I can find it again), I believe they chose the concrete function in order to fix the input shape. Trying your code in TF 2.2 raises the error:

ValueError: None is only supported in the 1st dimension. Tensor 'input_tensor' has invalid shape '[1, None, None, 3]'.

I believe this woks in your code, because from TF 2.3 they add support for variable input image size in TFLite (at least, that's according to the RC2 release info):

Added support for converting and resizing models with dynamic (placeholder) dimensions. Previously, there was only limited support for dynamic batch size, and even that did not guarantee that the model could be properly resized at runtime.

Is there a way to set the shape of the input tensor of a model loaded via saved_model?

Quick update: I tested the code with TF 2.3-rc2 and it still doesn't work (it raises the same error as I got in my first post). If I try tf-nightly, though, then the conversion works.

jtrammell-dla commented 4 years ago

I'm also facing this hardship. I've managed a workaround for now. The following code seems to work, at least for the _ssd_mobilenet_v1_fpn_640x640_coco17tpu-8 model (which I unpacked to a folder called "pretrained"):

import os

from object_detection.utils import config_util
from object_detection.builders import model_builder
import numpy as np
import tensorflow as tf

os.environ['CUDA_VISIBLE_DEVICES'] = '0'

configs = config_util.get_configs_from_pipeline_file('pretrained/pipeline.config')
detection_model = model_builder.build(configs['model'], is_training=False)

class MyModel(tf.keras.Model):
    def __init__(self, model):
        super(MyModel, self).__init__()
        self.model = model

    def call(self, x):
        return self.model.predict(x, None)

km = MyModel(detection_model)

y = km.predict(np.random.random((1,256,256,3)).astype(np.float32))

converter = tf.lite.TFLiteConverter.from_keras_model(km)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter.experimental_new_converter = True
converter.allow_custom_ops = False
tflite_model = converter.convert()

open('model.tflite', 'wb').write(tflite_model)

jtrammell-dla commented 4 years ago

The above is a "minimal" case of course, and doesn't include pre/post-processing.

GPhilo commented 4 years ago

@jtrammell-dla Does it work for you if you add back the pre- and/or post-processing steps? I can run your code, but adding back pre- and post-processing breaks TFLite with very obscure errors.

jtrammell-dla commented 4 years ago

Alas, shortly after posting those comments I also ran into issue adding the pre/post-processing in. I haven't gotten around them yet, but I believe they are related to the internal use of map_fn. Since my ultimate use-case is TFLite, and I've seen hints that SSD is the only meta arch this library can export to TFLite (I'm actually focused on efficientdet, for obvious reasons), I'll probably have to put a hold on this code and come back when it has matured a bit more. Very excited to see that happen! Edit: I just attempted an export of an efficientdet model and, with the exception of the pre/post code as we've been discussing, it works!

jtrammell-dla commented 4 years ago

For posterity, this is the failure for the post-processing code:

Exception: /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/map_fn.py:242:9: error: requires element_shape to be 1D tensor during TF Lite transformation pass for dt in dtype_flat] ^ /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/map_fn.py:242:9: note: called from for dt in dtype_flat] ^ /usr/local/lib/python3.6/dist-packages/object_detection/utils/shape_utils.py:237:9: note: called from return tf.map_fn(fn, elems, dtype, parallel_iterations, back_prop) ^ /usr/local/lib/python3.6/dist-packages/object_detection/core/post_processing.py:1199:9: note: called from parallel_iterations=parallel_iterations) ^ /usr/local/lib/python3.6/dist-packages/object_detection/meta_architectures/ssd_meta_arch.py:770:12: note: called from masks=prediction_dict.get('mask_predictions')) ^
: note: called from /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:309:7: note: called from return func(*args, **kwargs) ^ /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:927:19: note: called from outputs = call_fn(cast_inputs, *args, **kwargs) ^ /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saving_utils.py:132:7: note: called from outputs = model(inputs, training=False) ^ /usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py:441:9: note: called from return weak_wrapped_fn().__wrapped__(*args, **kwds) ^ /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/map_fn.py:242:9: error: failed to legalize operation 'tf.TensorListReserve' for dt in dtype_flat] ^ /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/map_fn.py:242:9: note: called from for dt in dtype_flat] ^ /usr/local/lib/python3.6/dist-packages/object_detection/utils/shape_utils.py:237:9: note: called from return tf.map_fn(fn, elems, dtype, parallel_iterations, back_prop) ^ /usr/local/lib/python3.6/dist-packages/object_detection/core/post_processing.py:1199:9: note: called from parallel_iterations=parallel_iterations) ^ /usr/local/lib/python3.6/dist-packages/object_detection/meta_architectures/ssd_meta_arch.py:770:12: note: called from masks=prediction_dict.get('mask_predictions')) ^ : note: called from /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:309:7: note: called from return func(*args, **kwargs) ^ /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:927:19: note: called from outputs = call_fn(cast_inputs, *args, **kwargs) ^ /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saving_utils.py:132:7: note: called from outputs = model(inputs, training=False) ^ /usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py:441:9: note: called from return weak_wrapped_fn().__wrapped__(*args, **kwds)

jtrammell-dla commented 4 years ago

I suspect something trivial along the lines of needing to ensure that the batch size is defined.

jtrammell-dla commented 4 years ago

A little more digging and I was able to generate tflite model that would actually execute on my mobile device, with some caveats. I am unable to apply the GpuDelegate, and was given the following error:

Internal error: Failed to apply delegate: Following operations are not supported by GPU delegate: CAST: Operation is not supported. EXP: Operation is not supported. GATHER: Operation is not supported. LESS: Operation is not supported. NON_MAX_SUPPRESSION_V4: Operation is not supported. PACK: Operation is not supported. SELECT: Operation is not supported. SPLIT: Operation is not supported. TOPK_V2: Operation is not supported. UNPACK: Operation is not supported. 186 operations will run on the GPU, and the re

This is, ultimately, I think, a show stopper for me, as the entire purpose of my current task is to benchmark a few different models on the mobile GPU.

Running it on the CPU worked though. I had to disable parameter quantization, otherwise the class predictions were corrupted, although the bounding boxes were still relatively accurate. In the pipeline.config file I had to change model.ssd.post_processing.batch_non_max_suppression.use_static_shapes from false to true to fix the earlier issue of undefined tensor dimensions in the map_fn() calls. I also had to use a "hack" to force the model to use a fixed input shape (it's possible the use_static_shapes change actually isn't necessary and that this hack alone is what resolved the problem). The resulting code now looks like the following:

import os

from object_detection.utils import config_util
from object_detection.builders import model_builder
import numpy as np
import tensorflow as tf

os.environ['CUDA_VISIBLE_DEVICES'] = '0'

configs = config_util.get_configs_from_pipeline_file('pretrained/pipeline.config')
detection_model = model_builder.build(configs['model'], is_training=False)

ckpt = tf.compat.v2.train.Checkpoint(
      model=detection_model)
ckpt.restore('pretrained/checkpoint/ckpt-0').expect_partial()

class MyModel(tf.keras.Model):
    def __init__(self, model):
        super(MyModel, self).__init__()
        self.model = model
        self.seq = tf.keras.Sequential([
            tf.keras.Input([640,640,3], 1),
        ])

    def call(self, x):
        x = self.seq(x)
        images, shapes = self.model.preprocess(x)
        prediction_dict = self.model.predict(images, shapes)
        detections = self.model.postprocess(prediction_dict, shapes)
        boxes = detections['detection_boxes']
        scores = detections['detection_scores'][:,:,None]
        classes = detections['detection_classes'][:,:,None]
        combined = tf.concat([boxes, classes, scores], axis=2)
        return combined

km = MyModel(detection_model)

y = km.predict(np.random.random((1,640,640,3)).astype(np.float32))

converter = tf.lite.TFLiteConverter.from_keras_model(km)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter.experimental_new_converter = True
converter.allow_custom_ops = False
tflite_model = converter.convert()

open('model.tflite', 'wb').write(tflite_model)

jtrammell-dla commented 4 years ago

One last follow-up for me: It appears that the above code does not reliably work across architectures. While it worked for the ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8 model, it happily creates a tflite file for the efficientdet_d4_coco17_tpu-32 but the resulting model generates the error Failed to load model file: ByteBuffer is not a valid flatbuffer model when attempting to load it on a mobile device, despite the fact that no error was produced during it's creation.

GPhilo commented 4 years ago

I also had to use a "hack" to force the model to use a fixed input shape (it's possible the use_static_shapes change actually isn't necessary and that this hack alone is what resolved the problem).

I tried disabling the use_static_shapes and i causes a cryptic TFLite converter error, so I think they're both necessary. I made some experiments using a single InputLayer instead of the odd-looking Sequential model, but that doesn't work either (no clue what the difference is..)

thanlon58 commented 4 years ago

I'm having the same issue -- I tried @jtrammell-dla's code on efficientdet_d1 and got a gargantuan error readout (couldn't copy/paste if I tried, lots of it looks like hex code or something...)

Anyone make any progress on this? Really hoping for a V1-esque tool for tflite conversion soon.

TannerGilbert commented 4 years ago

They added a guide (https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md) but I still couldn't get it to work.

GPhilo commented 4 years ago

I believe the tool they added is broken. I'm trying to export a pretrained SSD Resnet50 FPN model (which they claim should be compatible). The export doesn't raise any errors, but the generated tflite file doesn't work (also no error raised, but the output info is nonsense and the output data is a single scalar 0 for all 4 outputs).

These are the steps I took to run my test:

1) install the latest object_detection API (master branch at commit a85181171ed5f797ec4d21068812f447080b5249) according to the instruction in the readme for TF2

2) download the pretrained TF2 SSD Resnet50 FPN 640x640 model from the zoo.

3) Followed the guide's steps and ran:

python object_detection/export_tflite_graph_tf2.py \                                       
    --pipeline_config_path "pre-trained models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/pipeline.config" \
    --trained_checkpoint_dir "pre-trained models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint" \
    --output_directory "pre-trained models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/tflite"

then converted via

tflite_convert --output_file ssd_resnet_v1_fpn_640x640.tflite --saved_model_dir pre-trained models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/tflite/saved_model

The generated .tflite file is 516 bytes only, which feels a little too small for the chosen model.

Then, I load the model and run it in this script:

import numpy as np
import tensorflow as tf
import cv2

# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="pre-trained models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/tflite/ssd_resnet_v1_fpn_640x640.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
image = cv2.resize(
  cv2.cvtColor(
    cv2.imread('an_image_with_a_car.jpg'), cv2.COLOR_BGR2RGB),
    tuple(input_shape[1:3][::-1])) / 255 - 0.5
interpreter.set_tensor(input_details[0]['index'], image[None,...].astype(np.float32))

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data) # <<<< this prints 0.0

Inspecting the output_details, I have a feeling there is something very wrong there:

pprint(output_details)

# output:
[{'dtype': <class 'numpy.float32'>,
  'index': 1,
  'name': 'StatefulPartitionedCall:3',
  'quantization': (0.0, 0),
  'quantization_parameters': {'quantized_dimension': 0,
                              'scales': array([], dtype=float32),
                              'zero_points': array([], dtype=int32)},
  'shape': array([], dtype=int32),
  'shape_signature': array([], dtype=int32),
  'sparsity_parameters': {}},
 {'dtype': <class 'numpy.float32'>,
  'index': 1,
  'name': 'StatefulPartitionedCall:3',
  'quantization': (0.0, 0),
  'quantization_parameters': {'quantized_dimension': 0,
                              'scales': array([], dtype=float32),
                              'zero_points': array([], dtype=int32)},
  'shape': array([], dtype=int32),
  'shape_signature': array([], dtype=int32),
  'sparsity_parameters': {}},
 {'dtype': <class 'numpy.float32'>,
  'index': 1,
  'name': 'StatefulPartitionedCall:3',
  'quantization': (0.0, 0),
  'quantization_parameters': {'quantized_dimension': 0,
                              'scales': array([], dtype=float32),
                              'zero_points': array([], dtype=int32)},
  'shape': array([], dtype=int32),
  'shape_signature': array([], dtype=int32),
  'sparsity_parameters': {}},
 {'dtype': <class 'numpy.float32'>,
  'index': 1,
  'name': 'StatefulPartitionedCall:3',
  'quantization': (0.0, 0),
  'quantization_parameters': {'quantized_dimension': 0,
                              'scales': array([], dtype=float32),
                              'zero_points': array([], dtype=int32)},
  'shape': array([], dtype=int32),
  'shape_signature': array([], dtype=int32),
  'sparsity_parameters': {}}]

The list's items are the same object repeated 4 times, so I suspect something's wrong there.

Edit: tagging @srjoglekar246 as he seems to be the author in the commit that introduced the conversion script

srjoglekar246 commented 4 years ago

@GPhilo Can you try the latest nightly for tflite_convert?

srjoglekar246 commented 4 years ago

EfficientDet support is being explored....

GPhilo commented 4 years ago

@srjoglekar246 using tf-nigtly it seems the conversion works, I'd maybe put a line in the guide (and a check in the script?) that mentions the requirement for the latest TF version, as currently the object detection API has only TF 2.2 as a requirement and it's a bit confusing when both the conversion script and the converter don't raise any warning (and generate an empty model).

srjoglekar246 commented 4 years ago

Thats fair. I will add a note to the g3doc so its clearer moving forward. Thanks!

GPhilo commented 4 years ago

@srjoglekar246 Thanks for the update to the documentation, I saw that you added as well a part about int8 quantization. Is this also supported? ~~I tried enabling quantization for the SSD Resnet50 model I used in my test above, but I'm getting the following error:~~

Traceback (most recent call last): File "scripts/test_tflite_model.py", line 36, in <module> tflite_model = converter.convert() File "/home/slr/anaconda3/envs/tf_nightly/lib/python3.8/site-packages/tensorflow/lite/python/lite.py", line 724, in convert return super(TFLiteSavedModelConverterV2, File "/home/slr/anaconda3/envs/tf_nightly/lib/python3.8/site-packages/tensorflow/lite/python/lite.py", line 648, in convert result = self._calibrate_quantize_model(result, **flags) File "/home/slr/anaconda3/envs/tf_nightly/lib/python3.8/site-packages/tensorflow/lite/python/lite.py", line 474, in _calibrate_quantize_model return calibrate_quantize.calibrate_and_quantize( File "/home/slr/anaconda3/envs/tf_nightly/lib/python3.8/site-packages/tensorflow/lite/python/optimize/calibrator.py", line 95, in calibrate_and_quantize return self._calibrator.QuantizeModel( RuntimeError: Unsupported output type INT8 for output tensor 'StatefulPartitionedCall:3' of type FLOAT32.

The code I'm using for the quantization is:

import numpy as np
import tensorflow as tf
import cv2

def representative_dataset_gen(input_image_shape, num_samples_to_generate=100):
  import glob
  from random import shuffle, seed
  from PIL import Image
  seed(42)
  h, w, *_ = input_image_shape
  images = glob.glob('datasets/raw/testing/image_02/*/*.[pj][np]g')
  shuffle(images) # ensure we have data from different sequences for better dataset stats
  for fn in images[:num_samples_to_generate]:
    im = np.asarray(Image.open(fn).resize((w, h)))[None,...]/128-1
    yield [im.astype(np.float32)]

# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model('pre-trained models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/tflite/saved_model')

converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS_INT8,
  tf.lite.OpsSet.SELECT_TF_OPS,
  ]

#converter.inference_input_type = tf.int8 # tested with these enabled as well, but the error is the same
#converter.inference_output_type = tf.int8
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = lambda : representative_dataset_gen([640, 640, 3], 100)

tflite_model = converter.convert()

Edit: It works now

After realizing I had the wrong normalization values in the representative_dataset_gen, I reran the conversion script with the correct normalization and it worked without raising an error. I'm quite confused as to what exactly caused the error in the first place and how I accidentally fixed it, but for now it seems to work. I'll update this again if I get more insight on what happened (re-running it again with /255-0.5 also doesn't raise any error anymore..).

TannerGilbert commented 4 years ago

@srjoglekar246 Thanks for the update to the documentation, I saw that you added as well a part about int8 quantization. Is this also supported? I tried enabling quantization for the SSD Resnet50 model I used in my test above, but I'm getting the following error:

Traceback (most recent call last):
  File "scripts/test_tflite_model.py", line 36, in <module>
    tflite_model = converter.convert()
  File "/home/slr/anaconda3/envs/tf_nightly/lib/python3.8/site-packages/tensorflow/lite/python/lite.py", line 724, in convert
    return super(TFLiteSavedModelConverterV2,
  File "/home/slr/anaconda3/envs/tf_nightly/lib/python3.8/site-packages/tensorflow/lite/python/lite.py", line 648, in convert
    result = self._calibrate_quantize_model(result, **flags)
  File "/home/slr/anaconda3/envs/tf_nightly/lib/python3.8/site-packages/tensorflow/lite/python/lite.py", line 474, in _calibrate_quantize_model
    return calibrate_quantize.calibrate_and_quantize(
  File "/home/slr/anaconda3/envs/tf_nightly/lib/python3.8/site-packages/tensorflow/lite/python/optimize/calibrator.py", line 95, in calibrate_and_quantize
    return self._calibrator.QuantizeModel(
RuntimeError: Unsupported output type INT8 for output tensor 'StatefulPartitionedCall:3' of type FLOAT32.

The code I'm using for the quantization is:

import numpy as np
import tensorflow as tf
import cv2

def representative_dataset_gen(input_image_shape, num_samples_to_generate=100):
  import glob
  from random import shuffle, seed
  from PIL import Image
  seed(42)
  h, w, *_ = input_image_shape
  images = glob.glob('datasets/raw/testing/image_02/*/*.[pj][np]g')
  shuffle(images) # ensure we have data from different sequences for better dataset stats
  for fn in images[:num_samples_to_generate]:
    im = np.asarray(Image.open(fn).resize((w, h)))[None,...]/255-0.5
    yield [im.astype(np.float32)]

# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model('pre-trained models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/tflite/saved_model')
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS_INT8,
  tf.lite.OpsSet.SELECT_TF_OPS,
  ]

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8,
                                       tf.lite.OpsSet.SELECT_TF_OPS]
#converter.inference_input_type = tf.int8 # tested with these enabled as well, but the error is the same
#converter.inference_output_type = tf.int8
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = lambda : representative_dataset_gen([640, 640, 3], 100)

tflite_model = converter.convert() # error here

I also tried that today and it didn't work out for me. I'm not quite sure on how to create the representative_dataset

GPhilo commented 4 years ago

@TannerGilbert You can use the generator in my code as an example (btw, I updated the comment as I realized I still had the wrong normalization values in there). Essentially, all you need to provide is a zero-argument callable that produces a generator for data batches (as a list of numpy arrays matching the input shape of the model). I'm just not sure if the batches have to be of size one (though all the samples I saw did that).

TannerGilbert commented 4 years ago

@GPhilo your example worked for me. Thanks a lot.

rsun-bdti commented 3 years ago

All the discussions don't seem to address the original post, which was a feature request for the flexibility to include or exclude the post-processing part of SSD-Mobilenetv2 when generating a TFLite model. Am I missing something? Is that flexibility available now?

srjoglekar246 commented 3 years ago

@rsun-bdti Unfortunately, no. We never added that flexibility, because there weren't many use-cases for that. If you are willing to modify the exporting script locally, returning just the predicted tensors you want here should work. The rest of the exporting process should work fine, AFAIK.

rsun-bdti commented 3 years ago

@srjoglekar246 Thanks for the confirmation. I will modify the code you mentioned. I am trying to run inference of an SSD-Mobilenetv2 model on an MCU; that flexibility would be handy in my application.

tensorflow / models

Object detection TF 2.0 export to tf lite #8872

Prerequisites

1. The entire URL of the file you are using

2. Describe the feature you request

3. Additional context

4. Are you willing to contribute it? (Yes or No)

Edit: It works now