No GPU usage - centernet_hg104 - TF object detection API

[ v ] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[ v ] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[ v ] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

http://download.tensorflow.org/models/object_detection/tf2/20200713/centernet_hg104_512x512_coco17_tpu-8.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d2_coco17_tpu-32.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz

2. Describe the bug

When training centernet model with TF object detection API, CPU and Ram usage is very high while GPU usage is basically 0%. However, this doesn't happen when training efficientdet_d2 and ssd_resnet50 on the same dataset, where CPU, RAM and GPU are all used: see screenshots below. (Note that the models are being trained on the same image dataset)

centernet efficient_batch2 ssd

3. Steps to reproduce

Train the centernet model from TF OD API with the following pipeline.config file:

model {
  center_net {
    num_classes: 1
    feature_extractor {
      type: "hourglass_104"
      channel_means: 104.01361846923828
      channel_means: 114.03422546386719
      channel_means: 119.91659545898438
      channel_stds: 73.60276794433594
      channel_stds: 69.89082336425781
      channel_stds: 70.91507720947266
      bgr_ordering: true
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 512
        max_dimension: 512
        pad_to_max_dimension: true
      }
    }
    object_detection_task {
      task_loss_weight: 1.0
      offset_loss_weight: 1.0
      scale_loss_weight: 0.10000000149011612
      localization_loss {
        l1_localization_loss {
        }
      }
    }
    object_center_params {
      object_center_loss_weight: 1.0
      classification_loss {
        penalty_reduced_logistic_focal_loss {
          alpha: 2.0
          beta: 4.0
        }
      }
      min_box_overlap_iou: 0.6
      max_box_predictions: 50
    }
  }
}
train_config {
  batch_size: 2
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_aspect_ratio: 0.5
      max_aspect_ratio: 1.7000000476837158
      random_coef: 0.25
    }
  }
  data_augmentation_options {
    random_adjust_hue {
    }
  }
  data_augmentation_options {
    random_adjust_contrast {
    }
  }
  data_augmentation_options {
    random_adjust_saturation {
    }
  }
  data_augmentation_options {
    random_adjust_brightness {
    }
  }
  data_augmentation_options {
    random_absolute_pad_image {
      max_height_padding: 200
      max_width_padding: 200
      pad_color: 0.0
      pad_color: 0.0
      pad_color: 0.0
    }
  }
  optimizer {
    adam_optimizer {
      learning_rate {
        manual_step_learning_rate {
          initial_learning_rate: 0.0010000000474974513
          schedule {
            step: 1000
            learning_rate: 9.999999747378752e-05
          }
          schedule {
            step: 5000
            learning_rate: 9.999999747378752e-06
          }
        }
      }
      epsilon: 1.0000000116860974e-07
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "pre-trained-models/centernet_hg104_512x512_coco17_tpu-8/checkpoint/ckpt-0"
  num_steps: 5000
  max_number_of_boxes: 50
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "detection"
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "annotations/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "annotations/train.record"
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1
}
eval_input_reader {
  label_map_path: "annotations/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "annotations/test.record"
  }
}

4. Expected behavior

I would have expected to see reasonably high GPU usage in the centernet training as well.

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

Windows 10 CPU: i9-10980HK ram: 32GB GPU: GTX3080 8GB dedicated memory tensorflow = 2.5 CUDA = 11.3.1 cuDNN = 8.2.1.32

tensorflow / models