Question about converting Tensorflow Object Detection 2.3 to Model Optimizer

campos537 commented 3 years ago

i saw that in version 2021.1 it was added support to TF 2.X conversions, also using the DL Workbench there is an option there, but when trying to convert an object detection model trained with Tensorflow 2.3 it didn't worked giving this error: Shape is not defined for output 1 of "StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/NonMaxSuppressionV5". Is this the lack of support of this layer or my mistake?

Model architecture: SSD ResNet50 V1 FPN 640x640 (RetinaNet50)

`Stopped shape/value propagation at "StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/NonMaxSuppressionV5" node Model Optimizer arguments: Common parameters:

Path to the Input Model: None
Path for generated IR: /home/workbench/.workbench/models/16/original
IR output name: saved_model
Log level: ERROR
Batch: Not specified, inherited from the model
Input layers: input_tensor
Output layers: Not specified, inherited from the model
Input shapes: [1,640,640,3]
Mean values: Not specified
Scale values: Not specified
Scale factor: Not specified
Precision of IR: FP16
Enable fusing: True
Enable grouped convolutions fusing: True
Move mean values to preprocess section: None
Reverse input channels: False TensorFlow specific parameters:
Input model in text protobuf format: False
Path to model dump for TensorBoard: None
List of shared libraries with TensorFlow custom layers implementation: None
Update the configuration file with input/output node names: None
Use configuration file used to generate the model with Object Detection API: None
Use the config file: None Model Optimizer version: 2021.1.0-1237-bece22ac675-releases/2021/1 2020-10-15 09:30:00.459515: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/openvino/opencv/lib:/opt/intel/openvino/deployment_tools/ngraph/lib:/opt/intel/openvino/deployment_tools/inference_engine/external/hddl_unite/lib:/opt/intel/openvino/deployment_tools/inference_engine/external/hddl/lib:/opt/intel/openvino/deployment_tools/inference_engine/external/gna/lib:/opt/intel/openvino/deployment_tools/inference_engine/external/mkltiny_lnx/lib:/opt/intel/openvino/deployment_tools/inference_engine/external/tbb/lib:/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64 2020-10-15 09:30:00.459538: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Progress: [ ] 0.32% done 2020-10-15 09:30:05.546727: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/openvino/opencv/lib:/opt/intel/openvino/deployment_tools/ngraph/lib:/opt/intel/openvino/deployment_tools/inference_engine/external/hddl_unite/lib:/opt/intel/openvino/deployment_tools/inference_engine/external/hddl/lib:/opt/intel/openvino/deployment_tools/inference_engine/external/gna/lib:/opt/intel/openvino/deployment_tools/inference_engine/external/mkltiny_lnx/lib:/opt/intel/openvino/deployment_tools/inference_engine/external/tbb/lib:/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64 2020-10-15 09:30:05.546746: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303) 2020-10-15 09:30:05.546764: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist 2020-10-15 09:30:05.546906: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-10-15 09:30:05.568021: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2299965000 Hz 2020-10-15 09:30:05.568380: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4519920 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-10-15 09:30:05.568401: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-10-15 09:30:16.125642: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2020-10-15 09:30:16.125760: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session 2020-10-15 09:30:16.341779: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816] Optimization results for grappler item: graph_to_optimize 2020-10-15 09:30:16.341813: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818] function_optimizer: Graph size after: 4038 nodes (3550), 4553 edges (4058), time = 127.839ms. 2020-10-15 09:30:16.341820: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818] function_optimizer: function_optimizer did nothing. time = 2.701ms. Progress: [ ] 0.63% done Progress: [ ] 0.95% done Progress: [ ] 1.27% done Progress: [ ] 1.59% done Progress: [ ] 1.90% done Progress: [ ] 2.22% done Progress: [ ] 2.54% done Progress: [ ] 2.86% done Progress: [ ] 3.17% done Progress: [ ] 3.49% done Progress: [ ] 3.81% done Progress: [ ] 4.13% done Progress: [ ] 4.44% done Progress: [ ] 4.76% done Progress: [. ] 5.08% done Progress: [. ] 5.40% done Progress: [. ] 5.71% done Progress: [. ] 6.03% done Progress: [. ] 6.35% done Progress: [. ] 6.67% done Progress: [. ] 6.98% done Progress: [. ] 7.30% done Progress: [. ] 7.62% done Progress: [. ] 7.94% done Progress: [. ] 8.25% done Progress: [. ] 8.57% done Progress: [. ] 8.89% done Progress: [. ] 9.21% done Progress: [. ] 9.52% done Progress: [. ] 9.84% done Progress: [.. ] 10.16% done Progress: [.. ] 10.48% done Progress: [.. ] 10.79% done Progress: [.. ] 11.11% done Progress: [.. ] 11.43% done Progress: [.. ] 11.75% done Progress: [.. ] 12.06% done Progress: [.. ] 12.38% done Progress: [.. ] 12.70% done Progress: [.. ] 13.02% done Progress: [.. ] 13.33% done Progress: [.. ] 13.65% done Progress: [.. ] 13.97% done Progress: [.. ] 14.29% done Progress: [.. ] 14.60% done Progress: [.. ] 14.92% done Progress: [... ] 15.24% done Progress: [... ] 15.56% done Progress: [... ] 15.87% done Progress: [... ] 16.19% done Progress: [... ] 16.51% done Progress: [... ] 16.83% done Progress: [... ] 17.14% done Progress: [... ] 17.46% done Progress: [... ] 17.78% done Progress: [... ] 18.10% done Progress: [... ] 18.41% done Progress: [... ] 18.73% done Progress: [... ] 19.05% done Progress: [... ] 19.37% done Progress: [... ] 19.68% done Progress: [.... ] 20.00% done Progress: [.... ] 20.32% done Progress: [.... ] 20.63% done Progress: [.... ] 20.95% done Progress: [.... ] 21.27% done Progress: [.... ] 21.59% done Progress: [.... ] 21.90% done Progress: [.... ] 22.22% done Progress: [.... ] 22.54% done Progress: [.... ] 22.86% done Progress: [.... ] 23.17% done Progress: [.... ] 23.49% done Progress: [.... ] 23.81% done Progress: [.... ] 24.13% done Progress: [.... ] 24.44% done Progress: [.... ] 24.76% done Progress: [..... ] 25.08% done Progress: [..... ] 25.40% done Progress: [..... ] 25.71% done Progress: [..... ] 26.03% done Progress: [..... ] 26.35% done Progress: [..... ] 26.67% done Progress: [..... ] 26.98% done Progress: [..... ] 27.30% done Progress: [..... ] 27.62% done Progress: [..... ] 27.94% done Progress: [..... ] 28.25% done Progress: [..... ] 28.57% done Progress: [..... ] 28.89% done Progress: [..... ] 29.21% done Progress: [..... ] 29.52% done Progress: [..... ] 29.84% done Progress: [...... ] 30.16% done Progress: [...... ] 30.48% done Progress: [...... ] 30.79% done Progress: [...... ] 31.11% done Progress: [...... ] 31.43% done Progress: [...... ] 31.75% done Progress: [...... ] 32.06% done Progress: [...... ] 32.38% done Progress: [...... ] 32.70% done Progress: [...... ] 33.02% done Progress: [...... ] 33.33% done Progress: [...... ] 33.65% done Progress: [...... ] 33.97% done Progress: [...... ] 34.29% done Progress: [...... ] 34.60% done Progress: [...... ] 34.92% done Progress: [....... ] 35.24% done Progress: [....... ] 35.56% done Progress: [....... ] 35.87% done Progress: [....... ] 36.19% done Progress: [....... ] 36.51% done Progress: [....... ] 36.83% done Progress: [....... ] 37.14% done Progress: [....... ] 37.46% done Progress: [....... ] 37.78% done Progress: [....... ] 38.10% done Progress: [....... ] 38.41% done Progress: [....... ] 38.73% done Progress: [....... ] 39.05% done Progress: [....... ] 39.37% done Progress: [....... ] 39.68% done Progress: [........ ] 40.00% done Progress: [........ ] 40.32% done Progress: [........ ] 40.63% done Progress: [........ ] 40.95% done Progress: [........ ] 41.27% done Progress: [........ ] 41.59% done Progress: [........ ] 41.90% done Progress: [........ ] 42.22% done Shape is not defined for output 1 of "StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/NonMaxSuppressionV5". Cannot infer shapes or values for node "StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/NonMaxSuppressionV5". Not all output shapes were inferred or fully defined for node "StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/NonMaxSuppressionV5". For more information please refer to Model Optimizer FAQ, question #40. (https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=40#question-40) It can happen due to bug in custom shape infer function <function NonMaxSuppression.infer at 0x7fbfdb3436a8>. Or because the node inputs have incorrect values/shapes. Or because input shapes are incorrect (embedded to the model or passed via --input_shape). Run Model Optimizer with --log_level=DEBUG for more information. Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Stopped shape/value propagation at "StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/NonMaxSuppressionV5" node. For more information please refer to Model Optimizer FAQ, question #38. (https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=38#question-38)`

jgespino commented 3 years ago

Hi @campos537

Thanks for reaching out, in the previous release (2020.4) TF2 Object Detection API models were not supported yet. I don't believe that has change but let me confirm with my peers and get back to you.

https://github.com/openvinotoolkit/openvino/issues/1801

Regards, Jesus

campos537 commented 3 years ago

Hey @jgespino

It will be really great be able convert the TF2 detection models to MO, i saw yesterday that you guys added the NonMaximumSupression on the Inference Engine. Thanks for the help!

agungor2 commented 3 years ago

Hi @campos537,

Could you try running it with Model Optimizer and let us know if the issue still persists? We also have this model available on Open Model Zoo which is I think trained on an earlier version of TF: ssd_resnet50_v1_fpn_coco You can refer to MO parameters in the link.

Can you also point me to the frozen graph location you are using so that I can also reproduce it on my platform?

campos537 commented 3 years ago

Hey @agungor2,

I tried to run also with the Model Optimizer directly but i had the same issue, i will try to run again following the same parameters you guys showed on Open Model Zoo. The problem is that the model was trained using custom dataset and only work as a saved model, it was trained by using the Tensorflow 2.3

campos537 commented 3 years ago

I was able to test with the same parameters as here ssd_resnet50_v1_fpn_coco but didn't worked (the output layers names were different) i was able to convert only the model available but not my model which has 2.3 version. This is the config which i used to train the model -> ssd_resnet50_v1_fpn_640x640_coco17_tpu-8

Also i tested and compared the 1.X model optimized and the Tensorflow 2.3 SavedModel Speed on CPU and it was almost the same which i found really weird, both got 600ms

jgespino commented 3 years ago

Hi @campos537

I apologize for the delay, TF2.x Object Detection API models are currently not supported on OpenVINO toolkit, only TF1.x Object Detection API models are supported.

Regards, Jesus

campos537 commented 3 years ago

Hey Jesus, thanks for your answer! It would be really great to have this support soon.

jgespino commented 3 years ago

@campos537 We have added your feedback to the feature request for the development team. Thank you for your input!

Regards, Jesus

lazarevevgeny commented 3 years ago

@campos537 We have added your feedback to the feature request for the development team. Thank you for your input!

Regards, Jesus

PR with adding support for these models is https://github.com/openvinotoolkit/openvino/pull/3556. Please, refer to the instructions on how to convert these types of models in the https://github.com/openvinotoolkit/openvino/blob/e1a955d528b3006b3e1598a617ffd97ba5663f5b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_Object_Detection_API_Models.md

campos537 commented 3 years ago

Wow thats really great news! This will be a lot useful, thanks for the work on that!

abhishekthanki commented 3 years ago

@lazarevevgeny Are TF 2.0 Object Detection API models supported in OpenVINO 2021.2 or will it only be supported in the next official release?

lazarevevgeny commented 3 years ago

They will be supported in 2021.3 release. But you can take the MO from master branch to verify support right now.

abhishekthanki commented 3 years ago

@lazarevevgeny Got it. I tried the MO from the master branch and used OpenVINO 2021.2 to perform inference on Pi + NCS 2. Works like a charm. Thank you!

abhishekthanki commented 3 years ago

@campos537 Were you able to convert your model after @lazarevevgeny's PR was merged? I'm asking because I tried converting a custom trained MobileNetSSD v2 model using openvino files from master branch but I'm running into an error with BatchMultiClassNonMaxSuppression which is similar to what you had reported few months ago.

campos537 commented 3 years ago

I didn't tried yet :cry: , i choose to use the Tensorflow Object Detecion 1.X which had more compatibility with the product here and used without converting using MO

abhishekthanki commented 3 years ago

@campos537 Ah no worries.

@lazarevevgeny Did you test converting custom trained models using TFOD 2.X? I tried converting pre-trained version of MobileNetSSD v2 and had no issues but when I tried converting a custom trained MobileNetSSD v2, I'm running into an error regarding BatchMultiClassNonMaxSuppression.

lazarevevgeny commented 3 years ago

@abhishekthanki , no, I have not tried. If you can share your model I can take a look and find a root cause of the BatchMultiClassNonMaxSuppression issue.

rchuzh99 commented 3 years ago

Hi, I have successfully converted the SSD MobileNet v2 320x320 pre-trained model from the TF2 detection model zoo but when I tried to run it with the python sample for object detection from OpenVINO, I got this error

[ INFO ] Loading model to the device
Traceback (most recent call last):
  File "object_detection_sample_ssd.py", line 206, in <module>
    sys.exit(main() or 0)
  File "object_detection_sample_ssd.py", line 159, in main
    exec_net = ie.load_network(network=net, device_name=args.device)
  File "ie_api.pyx", line 306, in openvino.inference_engine.ie_api.IECore.load_network
  File "ie_api.pyx", line 315, in openvino.inference_engine.ie_api.IECore.load_network
RuntimeError: Number of priors must match number of location predictions (4 vs 7668)

I am relatively new to this topic, may I know what it meant by the number of priors and may I know which code did you guys used to run the converted TF2 OD API. XML models? I have also compared the graph with the TF1 .xml model and it seems that there are differences between them: For the TF2 .XML model

For the TF1 .XML model

lazarevevgeny commented 3 years ago

@rchuzh99, did you follow the instruction from the documentation. So your command line looks like this:

<INSTALL_DIR>/deployment_tools/model_optimizer/mo_tf.py --saved_model_dir <dir_with_the_model>/ --transformations_config <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/ssd_support_api_v2.0.json --tensorflow_object_detection_api_pipeline_config <dir_with_the_model>/pipeline.config --reverse_input_channels

rchuzh99 commented 3 years ago

Yeah and with some additional arguments such as the --input_shape, --output_dir, as shown below: python <INSTALL_DIR>\openvino\model-optimizer\mo_tf.py --saved_model_dir <model_dir>\ssd_mobilenet_v2_320x320_coco17_tpu-8\saved_model --transformations_config <INSTALL_DIR>\Intel\openvino_2021\deployment_tools\model_optimizer\extensions\front\tf\ssd_support_api_v2.0.json --tensorflow_object_detection_api_pipeline_config <model_dir>\ssd_mobilenet_v2_320x320_coco17_tpu-8\pipeline.config --reverse_input_channels --output_dir <output_dir> --input_shape [1,300,300,3]

but I still get the same issue.

I have cloned both the mo_tf.py and ssd_support_api_v2.0.json from the master branch.

Weirdly, I got an error for the DetectionOutput layer when I run the MO with mo_tf.py from the Windows installation:

whereas I was 'able' to get the IR files with the mo_tf.py from the master branch.

rchuzh99 commented 3 years ago

@lazarevevgeny ,here are the graphs visualised using Netron: Left and Bottom(TF2 OD model) ,and Right (IR model of the TF1 Frozen OD model from openvino)

lazarevevgeny commented 3 years ago

We are working on identifying the root cause of this issue.

lazarevevgeny commented 3 years ago

The fix is in https://github.com/openvinotoolkit/openvino/pull/4529

lazarevevgeny commented 3 years ago

@rchuzh99 , the fix has been merged to master. Please, try.

rchuzh99 commented 3 years ago

@lazarevevgeny, I've just tried the updated MO and it works great. Thanks for helping out cheers!!! If you don't mind, is there any documentation explaining the differences between the xml of the TF1 OD API (Left) and the TF2 OD API (Right) models?

lazarevevgeny commented 3 years ago

There is no such a documentation.

The difference is that MO for TF 1.X insert PriorBox/PriorBoxClustered operations to generate proposals (the third input to the DetectionOutput operation), while in the TF 2.X version of the model the MO const folds a TF sub-graph calculating these values into a single Const. Because of this you cannot change the input shape of the model input (image) or change the batch size (number of images) for the TF 2.X model IR. However, I am not sure that changing of the input image shape (as well as batch size) for TF 2.X models is actually possible and will not result in shape collision in the original TF mode...

rchuzh99 commented 3 years ago

@lazarevevgeny , thank you for your kind explanation. Thank you again for helping out with the issue.

openvinotoolkit / openvino

Question about converting Tensorflow Object Detection 2.3 to Model Optimizer #2682