IvanGarcia7 commented 2 years ago

Prerequisites

[X] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[X] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[x] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/configs/tf2/ssd_efficientdet_d4_1024x1024_coco17_tpu-32.config

2. Describe the bug

I am trying to re-train EfficientDet D4, coming from Tensorflow Model Zoo (http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d4_coco17_tpu-32.tar.gz) on my dataset.

The configuration file I am using is the following:

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 8
    add_background_class: false
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: [1.0, 2.0, 0.5]
        scales_per_octave: 3
      }
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 1024
        max_dimension: 1024
        pad_to_max_dimension: true
        }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        depth: 224
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          force_use_bias: true
          activation: SWISH
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            scale: true
            decay: 0.99
            epsilon: 0.001
          }
        }
        num_layers_before_predictor: 4
        kernel_size: 3
        use_depthwise: true
      }
    }
    feature_extractor {
      type: 'ssd_efficientnet-b4_bifpn_keras'
      bifpn {
        min_level: 3
        max_level: 7
        num_iterations: 7
        num_filters: 224
      }
      conv_hyperparams {
        force_use_bias: true
        activation: SWISH
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          scale: true,
          decay: 0.99,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.25
          gamma: 1.5
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.5
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint: "/home/models/efd4/checkpoint/ckpt-0"
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint_type: "detection"
  batch_size: 1
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  use_bfloat16: true
  num_steps: 2000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_scale_crop_and_pad_to_square {
      output_size: 1024
      scale_min: 0.1
      scale_max: 2.0
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: 0.002
          total_steps: 2000
          warmup_learning_rate: .0001
          warmup_steps: 500
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  label_map_path: "/home/labels/label_map.txt"
  tf_record_input_reader {
    input_path: "/home/records/train.tfrecord"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1;
}

eval_input_reader: {
  label_map_path: "/home/labels/label_map.txt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/home/records/validation.tfrecord"
  }
}

When I make use of model_main_tf2 to start training, no error appears. However, when I check the model accuracy, it does not detect anything.

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.021
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.012
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.003
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.012

I try to modify parameters like learning rate, the number of epochs, etc but doesn't work

3. Steps to reproduce

To Fine-Tuning this model, I have followed the steps established in the following guide (https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html).

python /home/drive/MyDrive/VISDRONE/model_main_tf2.py \
    --pipeline_config_path={pipeline_file} \
    --model_dir={model_dir} \
    --alsologtostderr \
    --num_train_steps={num_steps} \
    --sample_1_of_n_eval_examples=1 \
    --num_eval_steps={num_eval_steps}

2022-03-24 14:56:39.530945: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0324 14:56:39.539781 140467518502784 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: 2000
I0324 14:56:39.543960 140467518502784 config_util.py:552] Maybe overwriting train_steps: 2000
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0324 14:56:39.544119 140467518502784 config_util.py:552] Maybe overwriting use_bfloat16: False
I0324 14:56:39.553249 140467518502784 ssd_efficientnet_bifpn_feature_extractor.py:146] EfficientDet EfficientNet backbone version: efficientnet-b4
I0324 14:56:39.553378 140467518502784 ssd_efficientnet_bifpn_feature_extractor.py:147] EfficientDet BiFPN num filters: 224
I0324 14:56:39.553517 140467518502784 ssd_efficientnet_bifpn_feature_extractor.py:149] EfficientDet BiFPN num iterations: 7
I0324 14:56:39.558310 140467518502784 efficientnet_model.py:144] round_filter input=32 output=48
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.580137 140467518502784 cross_device_ops.py:618] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.582051 140467518502784 cross_device_ops.py:618] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.584519 140467518502784 cross_device_ops.py:618] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.585638 140467518502784 cross_device_ops.py:618] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.592988 140467518502784 cross_device_ops.py:618] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.597373 140467518502784 cross_device_ops.py:618] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.603657 140467518502784 efficientnet_model.py:144] round_filter input=32 output=48
I0324 14:56:39.603788 140467518502784 efficientnet_model.py:144] round_filter input=16 output=24
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.619617 140467518502784 cross_device_ops.py:618] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.620819 140467518502784 cross_device_ops.py:618] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.623020 140467518502784 cross_device_ops.py:618] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.624058 140467518502784 cross_device_ops.py:618] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0324 14:56:39.829434 140467518502784 efficientnet_model.py:144] round_filter input=16 output=24
I0324 14:56:39.829590 140467518502784 efficientnet_model.py:144] round_filter input=24 output=32
I0324 14:56:40.442389 140467518502784 efficientnet_model.py:144] round_filter input=24 output=32
I0324 14:56:40.442584 140467518502784 efficientnet_model.py:144] round_filter input=40 output=56
I0324 14:56:41.058132 140467518502784 efficientnet_model.py:144] round_filter input=40 output=56
I0324 14:56:41.058324 140467518502784 efficientnet_model.py:144] round_filter input=80 output=112
I0324 14:56:41.971299 140467518502784 efficientnet_model.py:144] round_filter input=80 output=112
I0324 14:56:41.971578 140467518502784 efficientnet_model.py:144] round_filter input=112 output=160
I0324 14:56:42.896141 140467518502784 efficientnet_model.py:144] round_filter input=112 output=160
I0324 14:56:42.896331 140467518502784 efficientnet_model.py:144] round_filter input=192 output=272
I0324 14:56:44.146403 140467518502784 efficientnet_model.py:144] round_filter input=192 output=272
I0324 14:56:44.146590 140467518502784 efficientnet_model.py:144] round_filter input=320 output=448
I0324 14:56:44.446191 140467518502784 efficientnet_model.py:144] round_filter input=1280 output=1792
I0324 14:56:44.504505 140467518502784 efficientnet_model.py:454] Building model efficientnet with params ModelConfig(width_coefficient=1.4, depth_coefficient=1.8, resolution=380, dropout_rate=0.4, blocks=(BlockConfig(input_filters=32, output_filters=16, kernel_size=3, num_repeat=1, expand_ratio=1, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=16, output_filters=24, kernel_size=3, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=24, output_filters=40, kernel_size=5, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=40, output_filters=80, kernel_size=3, num_repeat=3, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=80, output_filters=112, kernel_size=5, num_repeat=3, expand_ratio=6, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=112, output_filters=192, kernel_size=5, num_repeat=4, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=192, output_filters=320, kernel_size=3, num_repeat=1, expand_ratio=6, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise')), stem_base_filters=32, top_base_filters=1280, activation='simple_swish', batch_norm='default', bn_momentum=0.99, bn_epsilon=0.001, weight_decay=5e-06, drop_connect_rate=0.2, depth_divisor=8, min_depth=None, use_se=True, input_channels=3, num_classes=1000, model_name='efficientnet', rescale_input=False, data_format='channels_last', dtype='float32')
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py:564: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0324 14:56:44.738715 140467518502784 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py:564: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['/content/drive/MyDrive/VISDRONE/train.record']
I0324 14:56:44.751177 140467518502784 dataset_builder.py:163] Reading unweighted datasets: ['/content/drive/MyDrive/VISDRONE/train.record']
INFO:tensorflow:Reading record datasets for input file: ['/content/drive/MyDrive/VISDRONE/train.record']
I0324 14:56:44.751728 140467518502784 dataset_builder.py:80] Reading record datasets for input file: ['/content/drive/MyDrive/VISDRONE/train.record']
INFO:tensorflow:Number of filenames to read: 1
I0324 14:56:44.751873 140467518502784 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0324 14:56:44.752046 140467518502784 dataset_builder.py:88] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
W0324 14:56:44.754448 140467518502784 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0324 14:56:44.776529 140467518502784 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1082: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0324 14:56:49.483746 140467518502784 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1082: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1082: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0324 14:56:52.317593 140467518502784 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:1082: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
/usr/local/lib/python3.7/dist-packages/keras/backend.py:450: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/deprecation.py:616: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
W0324 14:57:59.473496 140462682519296 deprecation.py:547] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/deprecation.py:616: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
WARNING:tensorflow:Gradients do not exist for variables ['stack_6/block_1/expand_bn/gamma:0', 'stack_6/block_1/expand_bn/beta:0', 'stack_6/block_1/depthwise_conv2d/depthwise_kernel:0', 'stack_6/block_1/depthwise_bn/gamma:0', 'stack_6/block_1/depthwise_bn/beta:0', 'stack_6/block_1/project_bn/gamma:0', 'stack_6/block_1/project_bn/beta:0', 'top_bn/gamma:0', 'top_bn/beta:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
W0324 14:58:17.434093 140462682519296 utils.py:80] Gradients do not exist for variables ['stack_6/block_1/expand_bn/gamma:0', 'stack_6/block_1/expand_bn/beta:0', 'stack_6/block_1/depthwise_conv2d/depthwise_kernel:0', 'stack_6/block_1/depthwise_bn/gamma:0', 'stack_6/block_1/depthwise_bn/beta:0', 'stack_6/block_1/project_bn/gamma:0', 'stack_6/block_1/project_bn/beta:0', 'top_bn/gamma:0', 'top_bn/beta:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
WARNING:tensorflow:Gradients do not exist for variables ['stack_6/block_1/expand_bn/gamma:0', 'stack_6/block_1/expand_bn/beta:0', 'stack_6/block_1/depthwise_conv2d/depthwise_kernel:0', 'stack_6/block_1/depthwise_bn/gamma:0', 'stack_6/block_1/depthwise_bn/beta:0', 'stack_6/block_1/project_bn/gamma:0', 'stack_6/block_1/project_bn/beta:0', 'top_bn/gamma:0', 'top_bn/beta:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
W0324 14:58:42.918556 140462682519296 utils.py:80] Gradients do not exist for variables ['stack_6/block_1/expand_bn/gamma:0', 'stack_6/block_1/expand_bn/beta:0', 'stack_6/block_1/depthwise_conv2d/depthwise_kernel:0', 'stack_6/block_1/depthwise_bn/gamma:0', 'stack_6/block_1/depthwise_bn/beta:0', 'stack_6/block_1/project_bn/gamma:0', 'stack_6/block_1/project_bn/beta:0', 'top_bn/gamma:0', 'top_bn/beta:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
WARNING:tensorflow:Gradients do not exist for variables ['stack_6/block_1/expand_bn/gamma:0', 'stack_6/block_1/expand_bn/beta:0', 'stack_6/block_1/depthwise_conv2d/depthwise_kernel:0', 'stack_6/block_1/depthwise_bn/gamma:0', 'stack_6/block_1/depthwise_bn/beta:0', 'stack_6/block_1/project_bn/gamma:0', 'stack_6/block_1/project_bn/beta:0', 'top_bn/gamma:0', 'top_bn/beta:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
W0324 14:59:06.517044 140462682519296 utils.py:80] Gradients do not exist for variables ['stack_6/block_1/expand_bn/gamma:0', 'stack_6/block_1/expand_bn/beta:0', 'stack_6/block_1/depthwise_conv2d/depthwise_kernel:0', 'stack_6/block_1/depthwise_bn/gamma:0', 'stack_6/block_1/depthwise_bn/beta:0', 'stack_6/block_1/project_bn/gamma:0', 'stack_6/block_1/project_bn/beta:0', 'top_bn/gamma:0', 'top_bn/beta:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
WARNING:tensorflow:Gradients do not exist for variables ['stack_6/block_1/expand_bn/gamma:0', 'stack_6/block_1/expand_bn/beta:0', 'stack_6/block_1/depthwise_conv2d/depthwise_kernel:0', 'stack_6/block_1/depthwise_bn/gamma:0', 'stack_6/block_1/depthwise_bn/beta:0', 'stack_6/block_1/project_bn/gamma:0', 'stack_6/block_1/project_bn/beta:0', 'top_bn/gamma:0', 'top_bn/beta:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
W0324 14:59:31.055212 140462682519296 utils.py:80] Gradients do not exist for variables ['stack_6/block_1/expand_bn/gamma:0', 'stack_6/block_1/expand_bn/beta:0', 'stack_6/block_1/depthwise_conv2d/depthwise_kernel:0', 'stack_6/block_1/depthwise_bn/gamma:0', 'stack_6/block_1/depthwise_bn/beta:0', 'stack_6/block_1/project_bn/gamma:0', 'stack_6/block_1/project_bn/beta:0', 'top_bn/gamma:0', 'top_bn/beta:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
INFO:tensorflow:Step 100 per-step time 4.057s
I0324 15:04:44.796877 140467518502784 model_lib_v2.py:707] Step 100 per-step time 4.057s
INFO:tensorflow:{'Loss/classification_loss': 1.0777053,
 'Loss/localization_loss': 0.71329135,
 'Loss/regularization_loss': 0.048915524,
 'Loss/total_loss': 1.8399122,
 'learning_rate': 0.0002}
I0324 15:04:44.797298 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.0777053,
 'Loss/localization_loss': 0.71329135,
 'Loss/regularization_loss': 0.048915524,
 'Loss/total_loss': 1.8399122,
 'learning_rate': 0.0002}
INFO:tensorflow:Step 200 per-step time 2.467s
I0324 15:08:51.406273 140467518502784 model_lib_v2.py:707] Step 200 per-step time 2.467s
INFO:tensorflow:{'Loss/classification_loss': 1.1751853,
 'Loss/localization_loss': 0.7252056,
 'Loss/regularization_loss': 0.04891498,
 'Loss/total_loss': 1.949306,
 'learning_rate': 0.0002}
I0324 15:08:51.406655 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1751853,
 'Loss/localization_loss': 0.7252056,
 'Loss/regularization_loss': 0.04891498,
 'Loss/total_loss': 1.949306,
 'learning_rate': 0.0002}
INFO:tensorflow:Step 300 per-step time 2.476s
I0324 15:12:58.999202 140467518502784 model_lib_v2.py:707] Step 300 per-step time 2.476s
INFO:tensorflow:{'Loss/classification_loss': 1.1473204,
 'Loss/localization_loss': 0.696468,
 'Loss/regularization_loss': 0.048914447,
 'Loss/total_loss': 1.8927028,
 'learning_rate': 0.0002}
I0324 15:12:58.999642 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1473204,
 'Loss/localization_loss': 0.696468,
 'Loss/regularization_loss': 0.048914447,
 'Loss/total_loss': 1.8927028,
 'learning_rate': 0.0002}
INFO:tensorflow:Step 400 per-step time 2.472s
I0324 15:17:06.168886 140467518502784 model_lib_v2.py:707] Step 400 per-step time 2.472s
INFO:tensorflow:{'Loss/classification_loss': 1.3694557,
 'Loss/localization_loss': 0.66213036,
 'Loss/regularization_loss': 0.048913937,
 'Loss/total_loss': 2.0805001,
 'learning_rate': 0.0002}
I0324 15:17:06.169253 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.3694557,
 'Loss/localization_loss': 0.66213036,
 'Loss/regularization_loss': 0.048913937,
 'Loss/total_loss': 2.0805001,
 'learning_rate': 0.0002}
INFO:tensorflow:Step 500 per-step time 2.470s
I0324 15:21:13.213133 140467518502784 model_lib_v2.py:707] Step 500 per-step time 2.470s
INFO:tensorflow:{'Loss/classification_loss': 1.1455597,
 'Loss/localization_loss': 0.6874537,
 'Loss/regularization_loss': 0.048913423,
 'Loss/total_loss': 1.8819268,
 'learning_rate': 0.0002}
I0324 15:21:13.213557 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1455597,
 'Loss/localization_loss': 0.6874537,
 'Loss/regularization_loss': 0.048913423,
 'Loss/total_loss': 1.8819268,
 'learning_rate': 0.0002}
INFO:tensorflow:Step 600 per-step time 2.472s
I0324 15:25:20.460671 140467518502784 model_lib_v2.py:707] Step 600 per-step time 2.472s
INFO:tensorflow:{'Loss/classification_loss': 1.1520298,
 'Loss/localization_loss': 0.7305322,
 'Loss/regularization_loss': 0.04891293,
 'Loss/total_loss': 1.9314749,
 'learning_rate': 0.0002}
I0324 15:25:20.461070 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1520298,
 'Loss/localization_loss': 0.7305322,
 'Loss/regularization_loss': 0.04891293,
 'Loss/total_loss': 1.9314749,
 'learning_rate': 0.0002}
INFO:tensorflow:Step 700 per-step time 2.472s
I0324 15:29:27.649186 140467518502784 model_lib_v2.py:707] Step 700 per-step time 2.472s
INFO:tensorflow:{'Loss/classification_loss': 1.1544534,
 'Loss/localization_loss': 0.716094,
 'Loss/regularization_loss': 0.048912443,
 'Loss/total_loss': 1.9194598,
 'learning_rate': 0.0002}
I0324 15:29:27.649561 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1544534,
 'Loss/localization_loss': 0.716094,
 'Loss/regularization_loss': 0.048912443,
 'Loss/total_loss': 1.9194598,
 'learning_rate': 0.0002}
INFO:tensorflow:Step 800 per-step time 2.460s
I0324 15:33:33.606482 140467518502784 model_lib_v2.py:707] Step 800 per-step time 2.460s
INFO:tensorflow:{'Loss/classification_loss': 1.144963,
 'Loss/localization_loss': 0.68470913,
 'Loss/regularization_loss': 0.048911978,
 'Loss/total_loss': 1.8785841,
 'learning_rate': 0.0002}
I0324 15:33:33.606925 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.144963,
 'Loss/localization_loss': 0.68470913,
 'Loss/regularization_loss': 0.048911978,
 'Loss/total_loss': 1.8785841,
 'learning_rate': 0.0002}
INFO:tensorflow:Step 900 per-step time 2.463s
I0324 15:37:39.929426 140467518502784 model_lib_v2.py:707] Step 900 per-step time 2.463s
INFO:tensorflow:{'Loss/classification_loss': 1.0874641,
 'Loss/localization_loss': 0.70858914,
 'Loss/regularization_loss': 0.048911538,
 'Loss/total_loss': 1.8449647,
 'learning_rate': 0.0002}
I0324 15:37:39.929794 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.0874641,
 'Loss/localization_loss': 0.70858914,
 'Loss/regularization_loss': 0.048911538,
 'Loss/total_loss': 1.8449647,
 'learning_rate': 0.0002}
INFO:tensorflow:Step 1000 per-step time 2.466s
I0324 15:41:46.560392 140467518502784 model_lib_v2.py:707] Step 1000 per-step time 2.466s
INFO:tensorflow:{'Loss/classification_loss': 1.1569183,
 'Loss/localization_loss': 0.7344822,
 'Loss/regularization_loss': 0.048911124,
 'Loss/total_loss': 1.9403117,
 'learning_rate': 2e-05}
I0324 15:41:46.560784 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1569183,
 'Loss/localization_loss': 0.7344822,
 'Loss/regularization_loss': 0.048911124,
 'Loss/total_loss': 1.9403117,
 'learning_rate': 2e-05}
INFO:tensorflow:Step 1100 per-step time 2.506s
I0324 15:45:57.157139 140467518502784 model_lib_v2.py:707] Step 1100 per-step time 2.506s
INFO:tensorflow:{'Loss/classification_loss': 1.170136,
 'Loss/localization_loss': 0.73082036,
 'Loss/regularization_loss': 0.04891106,
 'Loss/total_loss': 1.9498675,
 'learning_rate': 2e-05}
I0324 15:45:57.157560 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.170136,
 'Loss/localization_loss': 0.73082036,
 'Loss/regularization_loss': 0.04891106,
 'Loss/total_loss': 1.9498675,
 'learning_rate': 2e-05}
INFO:tensorflow:Step 1200 per-step time 2.481s
I0324 15:50:05.257278 140467518502784 model_lib_v2.py:707] Step 1200 per-step time 2.481s
INFO:tensorflow:{'Loss/classification_loss': 1.1441879,
 'Loss/localization_loss': 0.6731573,
 'Loss/regularization_loss': 0.04891104,
 'Loss/total_loss': 1.8662562,
 'learning_rate': 2e-05}
I0324 15:50:05.257679 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1441879,
 'Loss/localization_loss': 0.6731573,
 'Loss/regularization_loss': 0.04891104,
 'Loss/total_loss': 1.8662562,
 'learning_rate': 2e-05}
INFO:tensorflow:Step 1300 per-step time 2.476s
I0324 15:54:12.866213 140467518502784 model_lib_v2.py:707] Step 1300 per-step time 2.476s
INFO:tensorflow:{'Loss/classification_loss': 1.0104654,
 'Loss/localization_loss': 0.72844565,
 'Loss/regularization_loss': 0.048911005,
 'Loss/total_loss': 1.787822,
 'learning_rate': 2e-05}
I0324 15:54:12.866582 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.0104654,
 'Loss/localization_loss': 0.72844565,
 'Loss/regularization_loss': 0.048911005,
 'Loss/total_loss': 1.787822,
 'learning_rate': 2e-05}
INFO:tensorflow:Step 1400 per-step time 2.480s
I0324 15:58:20.906429 140467518502784 model_lib_v2.py:707] Step 1400 per-step time 2.480s
INFO:tensorflow:{'Loss/classification_loss': 1.1930686,
 'Loss/localization_loss': 0.6976074,
 'Loss/regularization_loss': 0.048910983,
 'Loss/total_loss': 1.939587,
 'learning_rate': 2e-05}
I0324 15:58:20.906798 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1930686,
 'Loss/localization_loss': 0.6976074,
 'Loss/regularization_loss': 0.048910983,
 'Loss/total_loss': 1.939587,
 'learning_rate': 2e-05}
INFO:tensorflow:Step 1500 per-step time 2.472s
I0324 16:02:28.107308 140467518502784 model_lib_v2.py:707] Step 1500 per-step time 2.472s
INFO:tensorflow:{'Loss/classification_loss': 1.1081508,
 'Loss/localization_loss': 0.663561,
 'Loss/regularization_loss': 0.048910964,
 'Loss/total_loss': 1.8206228,
 'learning_rate': 2e-06}
I0324 16:02:28.107687 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1081508,
 'Loss/localization_loss': 0.663561,
 'Loss/regularization_loss': 0.048910964,
 'Loss/total_loss': 1.8206228,
 'learning_rate': 2e-06}
INFO:tensorflow:Step 1600 per-step time 2.473s
I0324 16:06:35.467271 140467518502784 model_lib_v2.py:707] Step 1600 per-step time 2.473s
INFO:tensorflow:{'Loss/classification_loss': 1.2835001,
 'Loss/localization_loss': 0.915443,
 'Loss/regularization_loss': 0.048910964,
 'Loss/total_loss': 2.247854,
 'learning_rate': 2e-06}
I0324 16:06:35.467811 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.2835001,
 'Loss/localization_loss': 0.915443,
 'Loss/regularization_loss': 0.048910964,
 'Loss/total_loss': 2.247854,
 'learning_rate': 2e-06}
INFO:tensorflow:Step 1700 per-step time 2.477s
I0324 16:10:43.180331 140467518502784 model_lib_v2.py:707] Step 1700 per-step time 2.477s
INFO:tensorflow:{'Loss/classification_loss': 1.146557,
 'Loss/localization_loss': 0.66494846,
 'Loss/regularization_loss': 0.048910964,
 'Loss/total_loss': 1.8604164,
 'learning_rate': 2e-06}
I0324 16:10:43.180699 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.146557,
 'Loss/localization_loss': 0.66494846,
 'Loss/regularization_loss': 0.048910964,
 'Loss/total_loss': 1.8604164,
 'learning_rate': 2e-06}
INFO:tensorflow:Step 1800 per-step time 2.473s
I0324 16:14:50.469713 140467518502784 model_lib_v2.py:707] Step 1800 per-step time 2.473s
INFO:tensorflow:{'Loss/classification_loss': 1.1496946,
 'Loss/localization_loss': 0.6987976,
 'Loss/regularization_loss': 0.04891097,
 'Loss/total_loss': 1.8974031,
 'learning_rate': 2e-06}
I0324 16:14:50.470112 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1496946,
 'Loss/localization_loss': 0.6987976,
 'Loss/regularization_loss': 0.04891097,
 'Loss/total_loss': 1.8974031,
 'learning_rate': 2e-06}
INFO:tensorflow:Step 1900 per-step time 2.469s
I0324 16:18:57.363423 140467518502784 model_lib_v2.py:707] Step 1900 per-step time 2.469s
INFO:tensorflow:{'Loss/classification_loss': 1.2016695,
 'Loss/localization_loss': 0.74829096,
 'Loss/regularization_loss': 0.04891097,
 'Loss/total_loss': 1.9988713,
 'learning_rate': 2e-06}
I0324 16:18:57.363805 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.2016695,
 'Loss/localization_loss': 0.74829096,
 'Loss/regularization_loss': 0.04891097,
 'Loss/total_loss': 1.9988713,
 'learning_rate': 2e-06}
INFO:tensorflow:Step 2000 per-step time 2.463s
I0324 16:23:03.656679 140467518502784 model_lib_v2.py:707] Step 2000 per-step time 2.463s
INFO:tensorflow:{'Loss/classification_loss': 1.1624724,
 'Loss/localization_loss': 0.64188105,
 'Loss/regularization_loss': 0.04891097,
 'Loss/total_loss': 1.8532643,
 'learning_rate': 2e-06}
I0324 16:23:03.657085 140467518502784 model_lib_v2.py:708] {'Loss/classification_loss': 1.1624724,
 'Loss/localization_loss': 0.64188105,
 'Loss/regularization_loss': 0.04891097,
 'Loss/total_loss': 1.8532643,
 'learning_rate': 2e-06}

import re
import numpy as np

output_directory = '/home/fine_tuned_model'

#place the model weights you would like to export here
last_model_path = '/home/training/'
print(last_model_path)
!python /content/models/research/object_detection/exporter_main_v2.py \
    --trained_checkpoint_dir {last_model_path} \
    --output_directory {output_directory} \
    --pipeline_config_path {pipeline_file}

2022-03-24 16:24:11.513483: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
I0324 16:24:11.523551 140297703389056 ssd_efficientnet_bifpn_feature_extractor.py:146] EfficientDet EfficientNet backbone version: efficientnet-b4
I0324 16:24:11.523777 140297703389056 ssd_efficientnet_bifpn_feature_extractor.py:147] EfficientDet BiFPN num filters: 224
I0324 16:24:11.523884 140297703389056 ssd_efficientnet_bifpn_feature_extractor.py:149] EfficientDet BiFPN num iterations: 7
I0324 16:24:11.528325 140297703389056 efficientnet_model.py:144] round_filter input=32 output=48
I0324 16:24:11.561406 140297703389056 efficientnet_model.py:144] round_filter input=32 output=48
I0324 16:24:11.561547 140297703389056 efficientnet_model.py:144] round_filter input=16 output=24
I0324 16:24:11.716008 140297703389056 efficientnet_model.py:144] round_filter input=16 output=24
I0324 16:24:11.716274 140297703389056 efficientnet_model.py:144] round_filter input=24 output=32
I0324 16:24:12.094801 140297703389056 efficientnet_model.py:144] round_filter input=24 output=32
I0324 16:24:12.095036 140297703389056 efficientnet_model.py:144] round_filter input=40 output=56
I0324 16:24:12.588015 140297703389056 efficientnet_model.py:144] round_filter input=40 output=56
I0324 16:24:12.588222 140297703389056 efficientnet_model.py:144] round_filter input=80 output=112
I0324 16:24:13.161465 140297703389056 efficientnet_model.py:144] round_filter input=80 output=112
I0324 16:24:13.161654 140297703389056 efficientnet_model.py:144] round_filter input=112 output=160
I0324 16:24:13.751483 140297703389056 efficientnet_model.py:144] round_filter input=112 output=160
I0324 16:24:13.751679 140297703389056 efficientnet_model.py:144] round_filter input=192 output=272
I0324 16:24:14.531807 140297703389056 efficientnet_model.py:144] round_filter input=192 output=272
I0324 16:24:14.532021 140297703389056 efficientnet_model.py:144] round_filter input=320 output=448
I0324 16:24:14.732889 140297703389056 efficientnet_model.py:144] round_filter input=1280 output=1792
I0324 16:24:14.775294 140297703389056 efficientnet_model.py:454] Building model efficientnet with params ModelConfig(width_coefficient=1.4, depth_coefficient=1.8, resolution=380, dropout_rate=0.4, blocks=(BlockConfig(input_filters=32, output_filters=16, kernel_size=3, num_repeat=1, expand_ratio=1, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=16, output_filters=24, kernel_size=3, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=24, output_filters=40, kernel_size=5, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=40, output_filters=80, kernel_size=3, num_repeat=3, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=80, output_filters=112, kernel_size=5, num_repeat=3, expand_ratio=6, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=112, output_filters=192, kernel_size=5, num_repeat=4, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=192, output_filters=320, kernel_size=3, num_repeat=1, expand_ratio=6, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise')), stem_base_filters=32, top_base_filters=1280, activation='simple_swish', batch_norm='default', bn_momentum=0.99, bn_epsilon=0.001, weight_decay=5e-06, drop_connect_rate=0.2, depth_divisor=8, min_depth=None, use_se=True, input_channels=3, num_classes=1000, model_name='efficientnet', rescale_input=False, data_format='channels_last', dtype='float32')
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py:458: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.map_fn(fn, elems, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.map_fn(fn, elems))
W0324 16:24:23.994653 140297703389056 deprecation.py:615] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py:458: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.map_fn(fn, elems, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.map_fn(fn, elems))
2022-03-24 16:24:54.173274: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
WARNING:tensorflow:Skipping full serialization of Keras layer <object_detection.meta_architectures.ssd_meta_arch.SSDMetaArch object at 0x7f9891552510>, because it is not built.
W0324 16:25:03.157266 140297703389056 save_impl.py:72] Skipping full serialization of Keras layer <object_detection.meta_architectures.ssd_meta_arch.SSDMetaArch object at 0x7f9891552510>, because it is not built.
W0324 16:26:53.641512 140297703389056 save.py:265] Found untraced functions such as WeightSharedConvolutionalBoxPredictor_layer_call_fn, WeightSharedConvolutionalBoxPredictor_layer_call_and_return_conditional_losses, WeightSharedConvolutionalBoxHead_layer_call_fn, WeightSharedConvolutionalBoxHead_layer_call_and_return_conditional_losses, WeightSharedConvolutionalClassHead_layer_call_fn while saving (showing 5 of 582). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /home/fine_tuned_model/saved_model/assets
I0324 16:28:18.870657 140297703389056 builder_impl.py:780] Assets written to: /home/fine_tuned_model/saved_model/assets
INFO:tensorflow:Writing pipeline config file to /home/fine_tuned_model/pipeline.config
I0324 16:28:22.083487 140297703389056 config_util.py:254] Writing pipeline config file to /home/fine_tuned_model/pipeline.config

Tensorboard:

When I infer over one image, this is the detections['detection_boxes'] output:

tf.Tensor( [[[0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.]]], shape=(1, 100, 4), dtype=float32)

4. Expected behavior

Since I am re-training on existing classes in the pre-trained model, the MAP should not drop to 0, since both the learning rate and the number of steps are low. This error occurs with other models of the EfficientDet family. On the other hand, I have tested with another data set and the results are similar.

Following the same process but re-training other models such as CenterNet this problem does not appear.

5. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ubuntu 21.04 TensorFlow version (use command below): v2.8.0-rc1-32-g3f878cff5b6 2.8.0 Python version: 3.10.2 CUDA/cuDNN version: 11.6/8.3 GPU model and memory: Nvidia RTX 3080Ti 12GB

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No

birdman9391 commented 2 years ago

@IvanGarcia7 Hello. Did you solve this problem? I have encountered similar issue in video classification task

IvanGarcia7 commented 2 years ago

@IvanGarcia7 Hello. Did you solve this problem? I have encountered similar issue in video classification task

Good morning @birdman9391

Yes, I can fix the issue. However, I was applying the process described in the issue for object detection. In my case, the problem was due to the file required in the fine-tuning named label_map.txt . You have to make sure that the id of each of the classes that you want to detect when re-training starts from 1. For example, based on your classification issue if you want to detect person and car, the labelmap should look like this:

item { id: 1 name: 'person } item { id: 2 name: 'car' }

At least that solved my problem and the model after re-training detected the objects correctly.

Best regards. Iván García

birdman9391 commented 2 years ago

Thanks @IvanGarcia7

So sad that my issue may comes from another thing T.T My model get accruracy about 30% in validation stage. But when I evaluate the model with the same script by restoring the checkpoint, it seems like giving a crashed result like 0% accuracy.

For example like this:

Prediction Label (by argmax): [10, 35, 69, 3]
GT Label : [93, 53, 44, 153]

Since I'm using the exact same tfrecord for validation, I think I missed another thing ... But thanks for your kind reply :D Hope you have a nice day!

IvanGarcia7 commented 2 years ago

Hello again @birdman9391

Since your model detects objects, I would say that the problem comes at the time of defining the json to obtain the mAP.

I suppose you are using the COCO evaluator, so you have to be careful how the bbox is defined, being in this case as follows:

[xmin, ymin, xmax-xmin, ymax-ymin]

You can see an example of how I create these annotations in my repo:

https://github.com/IvanGarcia7/ALAF/blob/main/ALAF/DEMO.ipynb

Another problem that may be happening may be related to the id of the images. If you have used tools like CVAT, the id is assigned non-sequentially as I have seen in some of my projects, so you will need to define a dictionary that matches the id with the respective image. See in the repo I attached above the section "LOAD THE JSON WITH THE GT ANNOTATIONS".

I hope your problem is solved. Hope you have a nice day too!

birdman9391 commented 2 years ago

Hello again @birdman9391

Since your model detects objects, I would say that the problem comes at the time of defining the json to obtain the mAP.

I suppose you are using the COCO evaluator, so you have to be careful how the bbox is defined, being in this case as follows:

[xmin, ymin, xmax-xmin, ymax-ymin]

You can see an example of how I create these annotations in my repo:

https://github.com/IvanGarcia7/ALAF/blob/main/ALAF/DEMO.ipynb

Another problem that may be happening may be related to the id of the images. If you have used tools like CVAT, the id is assigned non-sequentially as I have seen in some of my projects, so you will need to define a dictionary that matches the id with the respective image. See in the repo I attached above the section "LOAD THE JSON WITH THE GT ANNOTATIONS".

I hope your problem is solved. Hope you have a nice day too!

Hello @IvanGarcia7

My task is video classification task so I don't have label_map. I'm just worried that there might be some issue in saving the checkpoint file and hope you already found the reason. But our problem seems like coming from different issues.

And now I'm debugging the internal code in orbit.controller._maybe_save_checkpoint() and I found that the 'model weights before saving the checkpoint' and 'model weights after restoring the checkpoint' have different values.

I don't know why but I hope I can fix the issue by handling this.

Thanks for your kind reply again :D

Im-JimmyHu commented 2 years ago

the similiar error , have you solved the error maybe not well,but i think the model get the convergence,once i got good inference on yolov5,so the dataset out not 2 contain bad data. if sloved ,please @ me ,tks

asparmar14 commented 1 year ago

How tyo get the model accuracy and recall. As it is not possible to even get the plots of tensorboard too.

tensorflow / models

EfficientDet models - No detections after training on custom dataset #10555

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. System information