tensorflow / models

Models and examples built with TensorFlow
Other
77.18k stars 45.76k forks source link

Tflite interpreter of Centernet resnet50 v1 FPN keypoints fails #9414

Closed orihash closed 3 years ago

orihash commented 4 years ago

Prerequisites

2. Describe the bug

Not sure if it's a bug or not, but I am gonna mark it this way for now.

After downloading the pre-trained model of Centernet resnet50 V1 FPN Keypoints: http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_resnet50_v1_fpn_512x512_kpts_coco17_tpu-8.tar.gz, I tried to convert to tflite but I couldn't do that. I did some research and ran into a filed issue: https://github.com/tensorflow/tensorflow/issues/43495, that explained that I should re train the model and use tf-nightly to convert tflite model. After I got the tflite model I tried interpreting it with Interpreter interface for TensorFlow Lite Models (tf.lite.Interpreter(tflite_model)) but it kept failing and the error that I am getting is:

ValueError: Did not get operators or tensors in subgraph 1

3. Steps to reproduce

  1. Download tflite model: https://we.tl/t-qnmBmOU6wD
  2. Create a conda environment
  3. Install tf-nightly
  4. Try running the below code
import tensorflow as tf 

interpreter = tf.lite.Interpreter(path_to_tflite_model)
interpreter.allocate_tensors()
interpreter.invoke()

4. Expected behavior

I was expecting code posted above not to fail.

5. Additional context

I tried tflite model in Netron app works fine so it's not corrupted

6. System information

flamxi commented 4 years ago

I tried that model with C++ API of current master and could reproduce it, I got this output when tried to build from FlatBufferModel::BuildFromFile

INFO: Initialized TensorFlow Lite runtime. ERROR: Did not get operators or tensors in subgraph 1.

srjoglekar246 commented 4 years ago

The Exporting script does not support CenterNet as of now, but we are looking into it. Will update this bug when it lands.

orihash commented 4 years ago

@srjoglekar246 I have already exported to tflite CenterNet model, but I am having troubles to interpret it. I am getting: ValueError: Did not get operators or tensors in subgraph 1

srjoglekar246 commented 4 years ago

The architecture doesn't convert to TFLite as expected, so your exported model is not runnable

orihash commented 4 years ago

@srjoglekar246 Thanks a lot for your answer! Can I ask If you have any idea when the fix will be released?

srjoglekar246 commented 4 years ago

We are working with the Research team to add support in our conversion tooling, and also train some new models that are mobile-friendly (smaller dimensions, MobileNet backbone instead of ResNet, etc). There was significant work involved in re-writing some parts of the model to make it convertible to TFLite.

Should be landing in a month or two. Sorry about the delay!

gcervantes8 commented 3 years ago

I am getting the same bug when I try to convert my own machine translation model using TensorFlow 2.3.1 to convert to TFLite:

Internal error: Cannot create interpreter: Did not get operators or tensors in subgraph 3.

Is there any way to find out which operation in the model is causing this so I can avoid using it?

If I try converting the model using TF-nightly 2.5.0-dev20201111, and I use tensorflow-lite:0.0.0-nightly on Android Studio to run the model, then I get this different error:

Cannot create interpreter: Op builtin_code out of range: 127. Are you using old TFLite binary with newer model? Registration failed.

Is there anything I can try doing to fix it?

msalvaris commented 3 years ago

Bump, any update?

srjoglekar246 commented 3 years ago

Looks like the Research team had other priorities last couple of months :-(. We have started training a MobileNet version of CenterNet, but that might take some time to land. I tried using our scripts to convert CenterNet, but looks like the older versions in the detection zoo don't do well with TFLite.

To help move things forward for you in the meantime, can you specify whether you require both the bounding box & keypoint outputs from CenterNet, or just one of them?

msalvaris commented 3 years ago

Thanks @srjoglekar246. At the moment just care about the bounding box output

srjoglekar246 commented 3 years ago

You can try the SSD MobileNet V2 FPNLite 640x640 model from the detection zoo. It seems to have a comparable mAP to the CenterNet model, for bounding box output. See this guide for conversion.

msalvaris commented 3 years ago

Thanks @srjoglekar246 . Actually MobileDet seems better. Will stick with TF1 for now.

orihash commented 3 years ago

Looks like the Research team had other priorities last couple of months :-(. We have started training a MobileNet version of CenterNet, but that might take some time to land. I tried using our scripts to convert CenterNet, but looks like the older versions in the detection zoo don't do well with TFLite.

To help move things forward for you in the meantime, can you specify whether you require both the bounding box & keypoint outputs from CenterNet, or just one of them?

What about getting only keypoints ?

srjoglekar246 commented 3 years ago

@orihash That is being worked on; the model is being trained. The uploads on the detection zoo currently don't have a labels file that is required for any kind of re-export of the model (unless you use the SavedModel they already provide) - so I am pushing for them to include it with the upload this time.

srjoglekar246 commented 3 years ago

Hey @orihash we now have a CenterNet version with keypoints output, that includes a TFLite model. See the "CenterNet MobileNetV2 FPN Keypoints 512x512" import on the TF2 Detection Zoo. I am working on some documentation for users to better understand how to do pre- & post-processing on the model, but if you know how to work with the TF version, then you can use a visualization tool like Netron to interpret the keypoint outputs accordingly.

alexdwu13 commented 3 years ago

Hey @orihash we now have a CenterNet version with keypoints output, that includes a TFLite model. See the "CenterNet MobileNetV2 FPN Keypoints 512x512" import on the TF2 Detection Zoo. I am working on some documentation for users to better understand how to do pre- & post-processing on the model, but if you know how to work with the TF version, then you can use a visualization tool like Netron to interpret the keypoint outputs accordingly.

@srjoglekar246 Thanks for the update! Does the same hold true for the bounding box centerpoint model: "CenterNet MobileNetV2 FPN 512x512"? Or is only the keypoints model tflite-compatible?

srjoglekar246 commented 3 years ago

Yup, you should find pre exported TFLite models in the downloads.

On Fri, Feb 26, 2021 at 4:33 PM Alex Wu notifications@github.com wrote:

Hey @orihash https://github.com/orihash we now have a CenterNet version with keypoints output, that includes a TFLite model. See the "CenterNet MobileNetV2 FPN Keypoints 512x512" import on the TF2 Detection Zoo https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md. I am working on some documentation for users to better understand how to do pre- & post-processing on the model, but if you know how to work with the TF version, then you can use a visualization tool like Netron to interpret the keypoint outputs accordingly.

@srjoglekar246 https://github.com/srjoglekar246 Thanks for the update! Does the same hold true for the bounding box centerpoint model: "CenterNet MobileNetV2 FPN 512x512"? Or is only the keypoints model tflite-compatible?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/models/issues/9414#issuecomment-786965816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQAQXNPXB7DG76SNLOMFUTTBA4UZANCNFSM4S7KPEVA .

-- Sachin Joglekar | srjoglekar@google.com srjoglekar@google.com| | +1-650-309-9486

alexdwu13 commented 3 years ago

@srjoglekar246 Thank you. I ran the tflite model using the TFLite Benchmark tool and obtained these errors when running on the Android GPU delegate:

./android_aarch64_benchmark_model --graph=centernet_mobilenetv2_fpn_512x512.tflite --use_gpu=true                                                               
STARTING!
Log parameter values verbosely: [0]
Graph: [centernet_mobilenetv2_fpn_512x512.tflite]
Use gpu: [1]
Loaded model centernet_mobilenetv2_fpn_512x512.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
ERROR: Following operations are not supported by GPU delegate:
CAST: Operation is not supported.
FLOOR_DIV: Operation is not supported.
GATHER_ND: Operation is not supported.
GREATER: Operation is not supported.
LESS: Operation is not supported.
MUL: OP is supported, but tensor type isn't matched!
PACK: OP is supported, but tensor type isn't matched!
RESHAPE: OP is supported, but tensor type isn't matched!
SUB: OP is supported, but tensor type isn't matched!
SUM: OP is supported, but tensor type isn't matched!
TOPK_V2: Operation is not supported.
UNPACK: Operation is not supported.
95 operations will run on the GPU, and the remaining 41 operations will run on the CPU.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
Explicitly applied GPU delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.
srjoglekar246 commented 3 years ago

@alexdwu13 I assume you mean the MobileNet version and not the one you mentioned (CenterNet Resnet50 V1 FPN 512x512).

The 'errors' are just overly verbose logs, and they are expected. The GPU delegate typically supports the heavier ops in the MobileNet backbone, which happens for both the keypoints & OD models. There are a bunch of smaller post-processing ops that run on CPU, since they are not suited for GPU inference.

Hence part of the model runs on GPU, & other on CPU. As long as there is no crash things seem fine. Do you see a difference in latency between CPU & GPU?

alexdwu13 commented 3 years ago

@srjoglekar246 Sorry, you are correct – I updated the link to the correct model.

And yes, even though the GPU delegate is not able to execute the full model it achieves roughly a 4x speedup compared to the CPU. In other words you are likely right in that the heaviest ops are successfully delegated to the GPU.

With that said, CenterNet Mobilenet V2 FPN is still running slower than similarly accurate, yet older, models such as SSD-Mobilenetv3 (my benchmarking compares models of similar resolution & mAP). As SSD is an older and less efficient framework I suspect the primary factor to be that SSD-Mobilenetv3 runs entirely on GPU (108 GPU / 1 CPU) while CenterNet Mobilenet V2 FPN is only partly compatible (95 GPU / 45 CPU).

Do you know of any plan to refactor the operations to be more accelerator friendly? It seems like many of these ops should have GPU-compatible counterparts (though I realize the task is not trivial):

CAST: Operation is not supported.
FLOOR_DIV: Operation is not supported.
GATHER_ND: Operation is not supported.
GREATER: Operation is not supported.
LESS: Operation is not supported.
MUL: OP is supported, but tensor type isn't matched!
PACK: OP is supported, but tensor type isn't matched!
RESHAPE: OP is supported, but tensor type isn't matched!
SUB: OP is supported, but tensor type isn't matched!
SUM: OP is supported, but tensor type isn't matched!
TOPK_V2: Operation is not supported.
UNPACK: Operation is not supported.

Thanks for the insight!

srjoglekar246 commented 3 years ago

If we look at per-op profile, I believe that the CPU portion of CenterNet doesn't take that long (since its mainly just post-processing). CenterNet was targetting mainly at the Keypoints and person detection use-case, hence it doesn't perform as well as SSD for vanilla COCO detection (and needs to do more work than the SSD backbone). If some SSD flavor gives you a better mAP than CenterNet, use that one. CenterNet is better for keypoints.

mayankverk commented 3 years ago

Hey, I trained the CenterNet Mobilenet V2 FPN for my tfrecord dataset using you script and tried converting to tflite:

I first used export_tflite_graph_tf2.py and got the error:

ValueError: Only fixed_shape_resizeris supported with tflite. Found keep_aspect_ratio_resizer I worked around that by temporarily changing pipeline config to fixed_shape_resizer of 512x512 dims while using the same checkpoint file. I don't think that should cause any errors for conversion purposes.

I then used tflite converter with the model and got the following Error :

Traceback (most recent call last):
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1214, in binary_op_wrapper
    out = r_op(x)
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1235, in r_binary_op_wrapper
    y, x = maybe_promote_tensors(y, x)
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1173, in maybe_promote_tensors
    result_type = np_dtypes._result_type(
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/numpy_ops/np_dtypes.py", line 112, in _result_type
    dtype = np.result_type(*arrays_and_dtypes)
  File "<__array_function__ internals>", line 5, in result_type
TypeError: Cannot interpret '<KerasTensor: shape=(None, None, None, 64) dtype=float32 (created by layer 'conv2d_1')>' as a data type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "convert_model.py", line 26, in <module>
    detection_model = model_builder.build(
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/object_detection/builders/model_builder.py", line 1116, in build
    return build_func(getattr(model_config, meta_architecture), is_training,
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/object_detection/builders/model_builder.py", line 998, in _build_center_net_model
    feature_extractor = _build_center_net_feature_extractor(
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/object_detection/builders/model_builder.py", line 1078, in _build_center_net_feature_extractor
    return CENTER_NET_EXTRACTOR_FUNCTION_MAP[feature_extractor_config.type](
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/object_detection/models/center_net_mobilenet_v2_fpn_feature_extractor.py", line 167, in mobilenet_v2_fpn_sep_conv
    return CenterNetMobileNetV2FPNFeatureExtractor(
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/object_detection/models/center_net_mobilenet_v2_fpn_feature_extractor.py", line 97, in __init__
    top_down = top_down + residual
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1219, in binary_op_wrapper
    raise e
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1202, in binary_op_wrapper
    x, y = maybe_promote_tensors(x, y, force_same_dtype=False)
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1173, in maybe_promote_tensors
    result_type = np_dtypes._result_type(
  File "/home/mayank/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/numpy_ops/np_dtypes.py", line 112, in _result_type
    dtype = np.result_type(*arrays_and_dtypes)
  File "<__array_function__ internals>", line 5, in result_type
TypeError: Cannot interpret '<KerasTensor: shape=(None, None, None, 64) dtype=float32 (created by layer 'up_sampling2d')>' as a data type

Is Centernet not supported yet for tflite converter?

srjoglekar246 commented 3 years ago

CenterNet export requries a slightly modified command:

For object detection:

# Export the intermediate SavedModel that outputs 10 detections & takes in an 
# image of dim 320x320.
# Modify these parameters according to your needs.

python models/research/object_detection/export_tflite_graph_tf2.py \
  --pipeline_config_path=centernet_mobilenetv2_fpn_od/pipeline.config \
  --trained_checkpoint_dir=centernet_mobilenetv2_fpn_od/checkpoint \
  --output_directory=centernet_mobilenetv2_fpn_od/tflite \
  --centernet_include_keypoints=false \
  --max_detections=10 \
  --config_override=" \
    model{ \
      center_net { \
        image_resizer { \
          fixed_shape_resizer { \
            height: 320 \
            width: 320 \
          } \
        } \
      } \
    }"

For keypoints:

# Export the intermediate SavedModel that outputs 10 detections & takes in an 
# image of dim 320x320.
# Modify these parameters according to your needs.

python models/research/object_detection/export_tflite_graph_tf2.py \
  --pipeline_config_path=centernet_mobilenetv2_fpn_kpts/pipeline.config \
  --trained_checkpoint_dir=centernet_mobilenetv2_fpn_kpts/checkpoint \
  --output_directory=centernet_mobilenetv2_fpn_kpts/tflite \
  --centernet_include_keypoints=true \
  --keypoint_label_map_path=centernet_mobilenetv2_fpn_kpts/label_map.txt \
  --max_detections=10 \
  --config_override=" \
    model{ \
      center_net { \
        image_resizer { \
          fixed_shape_resizer { \
            height: 320 \
            width: 320 \
          } \
        } \
      } \
    }"

I am writing all this in a Colab, thats coming :-)

mayankverk commented 3 years ago

As for my error, it was due to model_builder. How should I handle the preprocess_input for centernet? The config file doesn't work with the usual model_builder and throws up the above traceback.

File "convert_model.py", line 26, in <module> detection_model = model_builder.build(model_config=model_config, is_training=False)

Will support for that be added? Also should the representative dataset consist of the original images or after the detection_model.preprocess function?

srjoglekar246 commented 3 years ago

We are still working on post-training quantization of CenterNet, looks like there are some issues :-(. Will ping back once they are resolved.

For pre-processing, I believe that CenterNet does not require any pre-processing on part of the user, except resizing to fit the model's dimensions (for the TFLite file). The module for TFLite export does pre-processing within the graph: https://github.com/tensorflow/models/blob/master/research/object_detection/export_tflite_graph_lib_tf2.py#L287

flamxi commented 3 years ago

hey @srjoglekar246 thanks for all the updates! I tried the tflite Keypoint model with Mobilenet backbone in my android device, I get these benchmark logs:

INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
ERROR: Following operations are not supported by GPU delegate:
ADD: OP is supported, but tensor type isn't matched!
ARG_MIN: Operation is not supported.
CAST: Operation is not supported.
FLOOR_DIV: Operation is not supported.
GATHER_ND: Operation is not supported.
GREATER: Operation is not supported.
GREATER_EQUAL: Operation is not supported.
LESS: Operation is not supported.
MUL: OP is supported, but tensor type isn't matched!
NOT_EQUAL: Operation is not supported.
PACK: OP is supported, but tensor type isn't matched!
RESHAPE: OP is supported, but tensor type isn't matched!
SELECT: Operation is not supported.
STRIDED_SLICE: STRIDED_SLICE supports for 3 or 4 dimensional tensors only.
STRIDED_SLICE: Slice does not support shrink_axis_mask parameter. 
SUB: OP is supported, but tensor type isn't matched!
SUM: OP is supported, but tensor type isn't matched!
TILE: Operation is not supported.
TOPK_V2: Operation is not supported.
TRANSPOSE: OP is supported, but tensor type isn't matched!
UNPACK: Operation is not supported.
111 operations will run on the GPU, and the remaining 166 operations will run on the CPU.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.

I saw the answer you gave for the non keypoints model that the ops are not suited for GPU and also they dont take much time, but how about this Keypoint model, for sure we have more ops on CPU than the other model!? Thanks!

srjoglekar246 commented 3 years ago

@flamxi What difference in latency do you see between GPU & CPU for this case?

mayankverk commented 3 years ago

@srjoglekar246 Would you have an idea of how long it might take to fix post training quantization on Centernet? Also, is the regular conversion to TFLite bug free?

srjoglekar246 commented 3 years ago

Float conversion should be bug free, yes. Post-training quantization will likely take a couple of weeks.

flamxi commented 3 years ago

@srjoglekar246 On average: GPU: 220 ms CPU: 650 ms I knew it was faster with this current implementation, I wondered if we have support for other ops there it will be even faster?!

flamxi commented 3 years ago

also @srjoglekar246 I get sometimes this kind of error at runtime do you have any idea about its origin

tflite::Subgraph::ReportErrorC(TfLiteContext*, char const*, ...)+120)
TfLiteStatus tflite::ops::builtin::reduce::EvalLogic<int>(TfLiteContext*, TfLiteNode*, tflite::ops::builtin::reduce::OpContext*, int, tflite::ops::builtin::reduce::OpContext* (*)(tflite::ops::builtin::reduce::OpContext*, tflite::ops::builtin::reduce::OpContext*))+1668)
_ZN6tflite3ops7builtin6reduce11EvalGenericILNS2_10KernelTypeE0ELNS2_10ReduceTypeE0EEE12TfLiteStatusP13TfLiteContextP10TfLiteNode+240
tflite::ops::builtin::reduce::EvalSum(TfLiteContext*, TfLiteNode*)+240
tflite::Subgraph::Invoke()+1008
flite::Interpreter::Invoke()+92
srjoglekar246 commented 3 years ago

@flamxi Thats usually upto the GPU team, will probably implement the ops incrementally as we go :-)

About the error, this is strange. You said this only happens sometimes? Maybe there is some pathological input that the model doesn't handle well. Can you try to see what kinds of inputs cause this?

flamxi commented 3 years ago

Actually even with correct inpur or dummy input it happens! I am using c++ API, when I run the model inside a binary file it is all smooth, but with the same code as a library through jni for java apps it crashes with that log error, I am guessing it's some threading issue with android or smh!

srjoglekar246 commented 3 years ago

That could be. TFLite Interpreters are not thread safe, since we reuse intermediate tensor memory for optimization.

mayankverk commented 3 years ago

@srjoglekar246 Are there any updates on the timeline for the post training quantization of Centernet-MobilenetV2? Also, is it possible to do quantization aware training on that model? Another question I have is : Are there weights available in the Zoo for values of depth_multiplier < 1 for the backbone?

srjoglekar246 commented 3 years ago

The Colab for CenterNet+TFLite has been added.

I am closing this issue, but feel free to file a new one for the quantization one @mayankverk . I have pinged the quantization team for updates on some of the issues. Note that post-training quantization for the detection CenterNet works, but not the keypoints one.

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No

GioFic95 commented 3 years ago

python models/research/object_detection/export_tflite_graph_tf2.py \
  --pipeline_config_path=centernet_mobilenetv2_fpn_od/pipeline.config \
  --trained_checkpoint_dir=centernet_mobilenetv2_fpn_od/checkpoint \
  --output_directory=centernet_mobilenetv2_fpn_od/tflite \
  --centernet_include_keypoints=false \
  --max_detections=10 \
  --config_override=" \
    model{ \
      center_net { \
        image_resizer { \
          fixed_shape_resizer { \
            height: 320 \
            width: 320 \
          } \
        } \
      } \
    }"

I am writing all this in a Colab, thats coming :-)

Hi @srjoglekar246 I followed your Colab tutorial but I still can't "generate a TFLite-friendly intermediate SavedModel" from the provided model.

In particular, the pipeline.config file contains mobilenet_v2_fpn_sep_conv as the feature extractor, which isn't available in the model builder. But, if I replace it with mobilenet_v2_fpn, I get a mismatch between the described model and the checkpoint: AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program.

Do you have any suggestions on how to fix the problem? Thank you in advance.

mayankverk commented 3 years ago

I think someone made a commit to the models repo that deleted that particular feature extractor. It is now incompatible I guess. They need to fix it. Several other models from the zoo are also not working.

srjoglekar246 commented 3 years ago

@mayankverk Let me forward this to the detection team & get back to you. Thanks for flagging!

tft-robert commented 3 years ago

@srjoglekar246 Anything new on this?

srjoglekar246 commented 3 years ago

The CenterNet mobilenet models should be working now, are you facing an error? The conversion/inference code in this Colab should work in the latest TF version.

Edwardsleo commented 3 years ago

Hi folks ... i just need to do all the things from scratch for keypoint detection with centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz

I downloaded coco2017 datasets from here https://cocodataset.org/#download Tfrecords Generated tfrecords using https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_coco_tf_record.py python create_coco_tf_record.py --logtostderr \ --train_image_dir="/mypath/train2017" \ --val_image_dir="/mypath/val2017" \ --test_image_dir="/mypath/test2017" \ --train_annotations_file="/mypath/annotations/instances_train2017.json" \ --val_annotations_file="/mypath/annotations/instances_val2017.json" \ --train_keypoint_annotations_file="/mypath/annotations/person_keypoints_train2017.json" \ --val_keypoint_annotations_file="/mypath/annotations/person_keypoints_val2017.json" \ --testdev_annotations_file="/mypath/image_info/annotations/image_info_test-dev2017.json" \ --output_dir="mypath/tfrecord2017"

Training Replaced my actual paths in centernet_mobilenetv2_fpn_kpts/pipeline.config keypoint_label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" ==>../centernet_mobilenetv2_fpn_kpts/label_map.txt

train_input_reader: label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" ==>../centernet_mobilenetv2_fpn_kpts/label_map.txt

input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" ==>tfrecord2017/coco_train.record-?????-of-00100

eval_input_reader input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" ==>tfrecord2017/coco_val.record-?????-of-00050

Triggered training model_main_tf2.py --model_dir=/var/data/annot/test/edward_inst1 --num_train_steps=1000 --sample_1_of_n_eval_examples=1 --pipeline_config_path=/var/data/annot/test/centernet_mobilenetv2_fpn_kpts/pipeline.config

after training i got checkpoints using the checkpoints i am following the =>Keypoints section code https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/centernet_on_device.ipynb =>converting checkponits to saved model and to tflite

My Issues are, On inferencing my own generated tflite model...i am not getting any keypoints detections on coco images...but pretrained models which is available already in centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz its working ...only my generated tflite model is not detecting keypoints

Gpus are not working dont know the issue.. for ssd models its working

In pipeline config..training batch size need to be changed as 4 or 8 instead of 512..it taking more time for initialization

Need help thanks in advance

srjoglekar246 commented 3 years ago

Hey @Edwardsleo can you paste the commands you ran to export & convert the TFLite model? There might be some discrepancy with how we generated the model vs how you are doing it. Also I assume you tested your trained TF model & it does inference fine before conversion to TFLite?

Edwardsleo commented 3 years ago

Hey @Edwardsleo can you paste the commands you ran to export & convert the TFLite model? There might be some discrepancy with how we generated the model vs how you are doing it. Also I assume you tested your trained TF model & it does inference fine before conversion to TFLite?

Hi srjoglekar246 thanks for your reply,

can you paste the commands you ran to export & convert the TFLite model?

This the commands i used to export saved model and converted to tflite

python export_tflite_graph_tf2.py \ --pipeline_config_path=/centernet_mobilenetv2_fpn_kpts/pipeline.config \ --trained_checkpoint_dir=/mypath/edwardcheckpoint \ --output_directory=/mypath/output \ --centernet_include_keypoints=true \ --keypoint_label_map_path=/centernet_mobilenetv2_fpn_kpts/label_map.txt \ --max_detections=10 \ --config_override=" \ model{ \ center_net { \ image_resizer { \ fixed_shape_resizer { \ height: 320 \ width: 320 \ } \ } \ } \ }"

tflite_convert --output_file=/mypath/output/model.tflite \ --saved_model_dir=/mypath/output/saved_model

Also I assume you tested your trained TF model & it does inference fine before conversion to TFLite?

I am using pretrained checkpoint of centernet_mobilenetv2_fpn_kpts/checkpoint(centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz) in my pipeline config ... for training the model from scratch.

My modified centernet_mobilenetv2_fpn_kpts/pipeline.config will look like this: model { center_net { num_classes: 1 feature_extractor { type: "mobilenet_v2_fpn_sep_conv" ...... keypoint_label_map_path: "/centernet_mobilenetv2_fpn_kpts/label_map.txt" ..... train_input_reader { label_map_path: "/centernet_mobilenetv2_fpn_kpts/label_map.txt" tf_record_input_reader { input_path: "/mypath/tfrecord2017/coco_train.record-?????-of-00100" } filenames_shuffle_buffer_size: 256 num_keypoints: 17 ...... eval_input_reader { label_map_path: "/centernet_mobilenetv2_fpn_kpts/label_map.txt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "/mypath/tfrecord2017/coco_val.record-?????-of-00050" } num_keypoints: 17

Note : I have used pipeline.config & label.txt which is already available inside the centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz

i also added three lines in pipeline.config it didnt worked fine_tune_checkpoint_version: V2 fine_tune_checkpoint: "centernet_mobilenetv2_fpn_kpts/checkpoint/cpkt-301" fine_tune_checkpoint_type: "detection"

I tested my generated checkpoint before exporting savedmodel/tflite ..via inferencing , i am not getting any keypoint detection. Inference on checkpoints i used this code https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/inference_tf2_colab.ipynb code ex: pipeline_config = os.path.join('models/research/object_detection/configs/tf2/', model_name + '.config') model_dir = 'models/research/object_detection/test_data/checkpoint/' ... ... ... viz_utils.visualize_boxes_and_labels_on_image_array( image_np_with_detections, detections['detection_boxes'][0].numpy(), (detections['detection_classes'][0].numpy() + label_id_offset).astype(int), detections['detection_scores'][0].numpy(), category_index, use_normalized_coordinates=True, max_boxes_to_draw=200, min_score_thresh=.30, agnostic_mode=False, keypoints=keypoints, keypoint_scores=keypoint_scores, keypoint_edges=get_keypoint_tuples(configs['eval_config']))

plt.figure(figsize=(12,16)) plt.imshow(image_np_with_detections) plt.show()

I tried to inference pretrained checkpoints already available inside the centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz (centernet_mobilenetv2fpn_512x512_coco17_kpts/checkpoint) ..keypoint detection is not happening ..i got some warning messages on inferencing WARNING:tensorflow:input_shape is undefined or non-square, or rows is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.

rest of the centermodels on inferencing they are using ckpt.restore(os.path.join(model_dir, 'ckpt-0')).expect_partial() for centernet_mobilenetv2fpn_512x512_coco17_kpts it is ckpt.restore(os.path.join(model_dir, 'ckpt-301')).expect_partial()

Rest of the centernet models pretrained checkpoint inferencing.. keypoint detection is working On centernet_mobilenetv2fpn_512x512_coco17_kpts checkpoint is not able to detect the keypoints.

THANKS IN ADVANCE

srjoglekar246 commented 3 years ago

@Edwardsleo It looks like there is something going wrong with your custom model training, so we should probably debug that before looking at TFLite conversion (because that won't work before the TF model works as intended). What error do you get when running your model with colab_tutorials/inference_tf2_colab.ipynb?

Edwardsleo commented 3 years ago

@Edwardsleo It looks like there is something going wrong with your custom model training, so we should probably debug that before looking at TFLite conversion (because that won't work before the TF model works as intended). What error do you get when running your model with colab_tutorials/inference_tf2_colab.ipynb?

Hi srjoglekar246 , the pretrained checkpoints which is available in centernet_mobilenetv2fpn_512x512_coco17_kpts itself not able to detect the keypoints of image on inferencing with colab_tutorials/inference_tf2_colab.ipynb. it is not showing any bug but warning message.

I am using that pretrained checkpoints for my model training ..during training and in inferencing... i am getting WARNING:tensorflow:input_shape is undefined or non-square, or rows is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.

keypoint model link for your reference http://download.tensorflow.org/models/object_detection/tf2/20210210/centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz

I am using pretrained checkpoint of centernet_mobilenetv2_fpn_kpts/checkpoints (centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz) in my pipeline config ... for training the model from scratch.

thanks in advance

Shubham654 commented 1 year ago

@srjoglekar246 Still facing the same problem for GPU delegate for centernet_mobilenetv2fpn_512x512_coco17_kpts. It's been couple of months we are facing this problems of unsupported ops. Any fixes or solutions for below problem?

> INFO: Initialized TensorFlow Lite runtime.
> INFO: Created TensorFlow Lite delegate for GPU.
> ERROR: Following operations are not supported by GPU delegate:
> ADD: OP is supported, but tensor type isn't matched!
> ARG_MIN: Operation is not supported.
> CAST: Operation is not supported.
> FLOOR_DIV: Operation is not supported.
> GATHER_ND: Operation is not supported.
> GREATER: Operation is not supported.
> GREATER_EQUAL: Operation is not supported.
> LESS: Operation is not supported.
> MUL: OP is supported, but tensor type isn't matched!
> NOT_EQUAL: Operation is not supported.
> PACK: OP is supported, but tensor type isn't matched!
> RESHAPE: OP is supported, but tensor type isn't matched!
> SELECT: Operation is not supported.
> STRIDED_SLICE: STRIDED_SLICE supports for 3 or 4 dimensional tensors only.
> STRIDED_SLICE: Slice does not support shrink_axis_mask parameter. 
> SUB: OP is supported, but tensor type isn't matched!
> SUM: OP is supported, but tensor type isn't matched!
> TILE: Operation is not supported.
> TOPK_V2: Operation is not supported.
> TRANSPOSE: OP is supported, but tensor type isn't matched!
> UNPACK: Operation is not supported.
> 111 operations will run on the GPU, and the remaining 166 operations will run on the CPU.
> INFO: Initialized OpenCL-based API.
> INFO: Created 1 GPU delegate kernels.