tensorflow / models

Models and examples built with TensorFlow
77.24k stars 45.75k forks source link

Export Object detection model (V2) fails on assertion "assert_existing_objects_matched" #8953

Open veonua opened 4 years ago

veonua commented 4 years ago


Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using


2. Describe the bug

I'm having the OOM issue on the big models, so I tried to train a dummy model,

faster_rcnn {
    num_classes: 9
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 80
        max_dimension: 100
        pad_to_max_dimension: true
train_config: {
  batch_size: 1
  num_steps: 200

for a some reason resulting checkpoint files are very small ckpt-1.index = 247 bytes ckpt-1.data-00000-of-00001 = 864 bytes

Export of this dummy model fails with assertion

raise AssertionError( ("Some Python objects were not bound to checkpointed values, likely due to changes in the Python program: %s") % (list(unused_python_objects),))

3. Steps to reproduce

Train dummy faster_rcnn model without finetune checkpoint

4. Expected behavior

final checkpoint file has to be ~100 Mb like in v1. Export happens without errors

5. Additional context

6. System information

ravikyram commented 4 years ago


Request you to share complete code snippet or steps to reproduce the issue in our environment.It helps us in localizing the issue faster.Thanks!

veonua commented 4 years ago

@ravikyram https://gist.github.com/veonua/e4186c92df80b49ad3d813f1219d0727

I'm using latest master version of object detection API

object_detection/model_main_tf2.py --model_dir=./output --pipeline_config_path=checkpoint/pipeline.config --num_train_steps=1000

object_detection/exporter_main_v2.py --input_type=image_tensor --trained_checkpoint_dir="./output" --output_directory="./model" --pipeline_config_path=checkpoint/pipeline.config

please let me know if you need any more information

aminzg commented 4 years ago

I get the same error with any model. Attached (TF2 Error.txt) is the terminal output with ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 (also tried bunch of other models including efficientdet_d0_coco17_tpu-32) Here's what I run:

python model_main_tf2.py \
  --pipeline_config_path=training/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.config \
  --model_dir=training/ \

I get the AssertionError, followed by tons of warnings related to weight loading.

AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program:

The platform I use:

I've tried on a brand new machine with fresh installation as well, the issue is persistant.

LiuXiaolong19920720 commented 4 years ago

Same Error:

AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program:
PetreanuAndi commented 4 years ago

same error. Bump. It happens to me when loading CenterNet

veonua commented 4 years ago

as the temporary solution, I've removed


The model seems to be working.

XiangL-Xr commented 4 years ago

same error. It happens to me when loading CenterNet_ResNet50_v1

claverru commented 4 years ago

Any updates on this? I'm getting the same error with every TF2 model I've tried.

legacyai commented 4 years ago

Likely a bug in TF 2.3.0

midhulavijayan commented 4 years ago

Change the line in pipeline.config

fine_tune_checkpoint_type: "classification" to fine_tune_checkpoint_type: "detection"

khu834 commented 3 years ago

Change the line in pipeline.config

fine_tune_checkpoint_type: "classification" to fine_tune_checkpoint_type: "detection"

For those who thumbed down this answer, can you provide some feedback as to why this is not the solution? I'm guessing if you run fine-tuning training again with this flag, then export the object detection model, it should work.

khu834 commented 3 years ago

I just tested this on the faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8.config model. First try I trained it with fine_tune_checkpoint_type: "classification", running exporter_main_v2.py resulted in the AssertionError above (even if I modify the config file to be "detection" for export)

Next, I trained the model again starting from the original pretrained model with fine_tune_checkpoint_type: "detection", running exporter_main_v2.py produced the saved_model correctly (using "detection" config)

anand08 commented 3 years ago

Hey @khu834 , i'm also getting the same error while training the model

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
    similarity_calculator {
      iou_similarity {
    encode_background_as_zeros: true
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.97,
            epsilon: 0.001,
    feature_extractor {
      type: 'ssd_mobilenet_v2_keras'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.97,
          epsilon: 0.001,
      override_base_feature_extractor_hyperparams: true
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.75,
          gamma: 2.0
      localization_loss {
        weighted_smooth_l1 {
          delta: 1.0
      classification_weight: 1.0
      localization_weight: 1.0
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      score_converter: SIGMOID

train_config: {
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint: "mobilenet_v2/mobilenet_v2.ckpt-1"
  fine_tune_checkpoint_type: "detection"
  batch_size: 96
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 7500
  data_augmentation_options {
    random_horizontal_flip {
  data_augmentation_options {
    ssd_random_crop {
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .8
          total_steps: 50000
          warmup_learning_rate: 0.13333
          warmup_steps: 2000
      momentum_optimizer_value: 0.9
    use_moving_average: false
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false

train_input_reader: {
  label_map_path: "kaggle_dataset/annotations/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "kaggle_dataset/annotations/train.record"

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false

eval_input_reader: {
  label_map_path: "kaggle_dataset/annotations/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "kaggle_dataset/annotations/test.record"

Above is the pipeline.config file used to train the model

Error AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program

Using, Tensorflow - 2.4.1 Detection model - ssd_mobilenet_v2_320x320_coco17_tpu-8

Can anyone help me on fixing this, Thanks in advance.

khu834 commented 3 years ago

fine_tune_checkpoint: "mobilenet_v2/mobilenet_v2.ckpt-1"

I have tested ssd_mobilenet_v2 training and export on TF 2.4.0 it has worked fine. Can you try the following?

If the fine_tuning_checkpoint and your training config are both in the 'detection' format, the export should work.

JuPasquin commented 3 years ago


I'm also having the same issue. I'm trying to train from scratch by commenting the fine-tuning parameters, and the same error message occurs when running exporter_main_v2_py.

I'm using:

Modifications made to the original config file:

Also, if removing status.assert_existing_objects_matched(), I'm able to save the model, but a warning shows up:

WARNING:tensorflow:Skipping full serialization of Keras layer <object_detection.meta_architectures.center_net_meta_arch.CenterNetMetaArch object at 0x7f11e6625f70>, because it is not built. W0817 09:44:23.985459 139717209786176 save_impl.py:76] Skipping full serialization of Keras layer <object_detection.meta_architectures.center_net_meta_arch.CenterNetMetaArch object at 0x7f11e6625f70>, because it is not built.

If I try to reload the model, I'm not able to retrieve any information because it is set as a _UserObject. Error when using model.summary():

AttributeError: '_UserObject' object has no attribute 'summary'

Thank you in advance.