tensorflow / models

Models and examples built with TensorFlow
Other
76.94k stars 45.8k forks source link

Mask R-CNN cannot be compiled with edgetpu compiler due to dynamic graph #10371

Closed pfan94 closed 2 years ago

pfan94 commented 2 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/official/vision/beta/modeling

2. Describe the bug

I did a post-training quantization for Mask R-CNN with tensorflow lite and then tried to compile the lite model with edgetpu compiler but failed.

When exporting the model, the input signature was already passed as static shape therefore there should be no dynamic-sized tensors in the graph. With some reading https://stackoverflow.com/questions/66682315/finding-dynamic-tensors-in-a-tflite-model, it states that control flow ops, like if, for, can also cause this problem. So I think it comes from the implementation side.

3. Steps to reproduce

def rep_data_gen()
    yield somedata

export_dir = model_dir
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)  # path to the SavedModel directory
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8, tf.lite.OpsSet.SELECT_TF_OPS,
                                       tf.lite.OpsSet.TFLITE_BUILTINS]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
converter.representative_dataset = rep_data_gen
tflite_model = converter.convert()

with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

then compile the model with edgetpu_compiler model.tflite

4. Expected behavior

Compile successes.

5. Additional context

Error message:

$ edgetpu_compiler model.tflite Edge TPU Compiler version 16.0.384591198 Started a compilation timeout timer of 180 seconds. ERROR: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors. Compilation failed: Model failed in Tflite interpreter. Please ensure model can be loaded/run in Tflite interpreter. Compilation child process completed within timeout period. Compilation failed!

6. System information

arashwan commented 2 years ago

Hello! to export to tflite, you will need to strip the model from the nms, and from input preprocessing. These preprocessing, and post-processing will have to be done on cpu.

pfan94 commented 2 years ago

Hello! to export to tflite, you will need to strip the model from the nms, and from input preprocessing. These preprocessing, and post-processing will have to be done on cpu.

Hello! Thank you for the hint. After disabling nms in roi_generator and detection_generator, it compiles successfully. The model has decoded_boxes and decoded_box_scores as output.

Another question is, do I have to compile mask_head as another model, which also holds the pretrained weights? Or is there a proper way to convert the model as a whole? Cuz now the whole model is like: first tflite model->nms->second tflite model.

arashwan commented 2 years ago

If you're not planning to use full integer quantization, then you can export the model with a tflite nms friendly version (which is available through a config, batched_nms version). Otherwise, nms will need to be done on cpu either using different tflite/saved_model modules.

pfan94 commented 2 years ago

If you're not planning to use full integer quantization, then you can export the model with a tflite nms friendly version (which is available through a config, batched_nms version). Otherwise, nms will need to be done on cpu either using different tflite/saved_model modules.

I tried use_batched_nms=True and apply_nms=True. The convertion process failed sliently with the finnal output:

fully_quantize: 0, inference_type: 6, input_inference_type: 0, output_inference_type: 0 error: illegal scale: INF

There is no information about this error and I have no idea how to debug it. The convertion succeeded only when I disabled the nms. So I think converting the model into submodules is the way to go.

Thanks again for your help. Have a nice day!

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No