tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
185.66k stars 74.18k forks source link

"Loss/regularization_loss' and "Loss/classification_loss" are high is high during the training of a object detection model with "ssd_mobilenet_v2_320x320_coco17_tpu-8" #63602

Closed Supriob9 closed 5 months ago

Supriob9 commented 6 months ago

Issue type

Performance

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.10

Custom code

Yes

OS platform and distribution

windows 10

Mobile device

No response

Python version

3.9

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

11.8, 8.1

GPU model and memory

Nvidia Quadro p3200

Current behavior?

"Loss/regularization_loss' is high during the training of a object detection model with "ssd_mobilenet_v2_320x320_coco17_tpu-8". Screenshot 2024-03-13 120121 How can I reduce the loss. Specially the regularization loss and classification_loss are high. I have tried reducing the learning rate to .08 and increasing the batch size to 24. I have added my configurations here.

Standalone code to reproduce the issue

# SSD with Mobilenet v2
# Trained on COCO17, initialized from Imagenet classification checkpoint
# Train on TPU-8
#
# Achieves 22.2 mAP on COCO17 Val

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 3
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.97,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v2_keras'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.97,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.75,
          gamma: 2.0
        }
      }
      localization_loss {
        weighted_smooth_l1 {
          delta: 1.0
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint: "ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint/ckpt-0"
  fine_tune_checkpoint_type: "detection"
  batch_size: 24
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 50000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .08
          total_steps: 50000
          warmup_learning_rate: 0.02666
          warmup_steps: 5000
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  label_map_path: "labelmap.pbtxt"
  tf_record_input_reader {
    input_path: "train.record"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}

eval_input_reader: {
  label_map_path: "labelmap.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "test.record"
  }
}

Relevant log output

I0313 12:02:59.305537  3000 model_lib_v2.py:705] Step 1300 per-step time 2.458s
INFO:tensorflow:{'Loss/classification_loss': 66547.22,
 'Loss/localization_loss': 3.502803,
 'Loss/regularization_loss': 2212851.2,
 'Loss/total_loss': 2279402.0,
 'learning_rate': 0.0405284}
I0313 12:02:59.344294  3000 model_lib_v2.py:708] {'Loss/classification_loss': 66547.22,
 'Loss/localization_loss': 3.502803,
 'Loss/regularization_loss': 2212851.2,
 'Loss/total_loss': 2279402.0,
 'learning_rate': 0.0405284}
Venkat6871 commented 6 months ago

Hi @Supriob9 ,

Sorry for the delay, Experiment with different learning rates. Instead of directly setting it to 0.08, try a range of values and monitor the impact on loss. Increase the variety and intensity of data augmentation techniques. This can help the model generalize better and reduce overfitting, potentially reducing regularization loss. Experiment with different base feature extractors or backbone architectures. You can try variations of MobileNet, EfficientNet, or other architectures to see if they improve performance. Adjust the hyperparameters of the feature extractor, such as depth multiplier and minimum depth, to find optimal settings for your dataset.

Thank you!

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 5 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 5 months ago

Are you satisfied with the resolution of your issue? Yes No