tensorflow / models

Models and examples built with TensorFlow
Other
77.23k stars 45.75k forks source link

poor performance with ssd mobilenet v2 QAT retrain #8982

Open wuchichung opened 4 years ago

wuchichung commented 4 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

2. Describe the bug

Precision and recall are low. I just have 1 class in my own dataset. My pretrain model is ssd_mobilenet_v2_quantized_coco. I run the training in CPU as suggested by the Coral tutorial. I try to train a person detector in a crowded scene.

88610941-8a3d7780-d0ba-11ea-88a3-f08712d6530d 88610944-8c073b00-d0ba-11ea-9b79-2daba6d8fe91

Validation matrix during trainning image

3. Steps to reproduce

4. Expected behavior

Better performance

5. Additional context

pipeline config model { ssd { num_classes: 1 image_resizer { fixed_shape_resizer { height: 300 width: 300 } } feature_extractor { type: "ssd_mobilenet_v2" depth_multiplier: 1.0 min_depth: 16 conv_hyperparams { regularizer { l2_regularizer { weight: 3.99999989895e-05 } } initializer { random_normal_initializer { mean: 0.0 stddev: 0.00999999977648 } } activation: RELU_6 batch_norm { decay: 0.97000002861 center: true scale: true epsilon: 0.0010000000475 } } override_base_feature_extractor_hyperparams: true } box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } box_predictor { convolutional_box_predictor { conv_hyperparams { regularizer { l2_regularizer { weight: 3.99999989895e-05 } } initializer { random_normal_initializer { mean: 0.0 stddev: 0.00999999977648 } } activation: RELU_6 batch_norm { decay: 0.97000002861 center: true scale: true epsilon: 0.0010000000475 } } min_depth: 0 max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.800000011921 kernel_size: 1 box_code_size: 4 apply_sigmoid_to_scores: false class_prediction_bias_init: -4.59999990463 } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.20000000298 max_scale: 0.949999988079 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.333299994469 } } post_processing { batch_non_max_suppression { score_threshold: 9.99999993923e-09 iou_threshold: 0.600000023842 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } normalize_loss_by_num_matches: true loss { localization_loss { weighted_smooth_l1 { } } classification_loss { weighted_sigmoid_focal { gamma: 2.0 alpha: 0.75 } } classification_weight: 1.0 localization_weight: 1.0 } encode_background_as_zeros: true normalize_loc_loss_by_codesize: true inplace_batchnorm_update: true freeze_batchnorm: false } } train_config { batch_size: 128 sync_replicas: true optimizer { momentum_optimizer { learning_rate { cosine_decay_learning_rate { learning_rate_base: 0.20000000298 total_steps: 1000 warmup_learning_rate: 0.0599999986589 warmup_steps: 100 } } momentum_optimizer_value: 0.899999976158 } use_moving_average: false } fine_tune_checkpoint: "/pretrain/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt" from_detection_checkpoint: true load_all_detection_checkpoint_vars: true num_steps: 50000 startup_delay_steps: 0.0 replicas_to_aggregate: 8 max_number_of_boxes: 100 unpad_groundtruth_tensors: false freeze_variables: [ 'FeatureExtractor/MobilenetV2/Conv/', 'FeatureExtractor/MobilenetV2/expanded_conv/', 'FeatureExtractor/MobilenetV2/expanded_conv_1/', 'FeatureExtractor/MobilenetV2/expanded_conv_2/', 'FeatureExtractor/MobilenetV2/expanded_conv_3/', 'FeatureExtractor/MobilenetV2/expanded_conv_4/', 'FeatureExtractor/MobilenetV2/expanded_conv_5/', 'FeatureExtractor/MobilenetV2/expanded_conv_6/', 'FeatureExtractor/MobilenetV2/expanded_conv_7/'] } train_input_reader { label_map_path: "/config/label_map.pbtxt" tf_record_input_reader { input_path: "/vol/tf-records/20200727_train.records" } } eval_config { num_examples: 50 metrics_set: "coco_detection_metrics" use_moving_averages: false } eval_input_reader { label_map_path: "/config/label_map.pbtxt" shuffle: false num_readers: 1 tf_record_input_reader { input_path: "/vol/tf-records/20200727_val.records" } } graph_rewriter { quantization { delay: 0 weight_bits: 8 activation_bits: 8 } }

Include any logs that would be helpful to diagnose the problem.

6. System information

kylelindgren commented 4 years ago

I solved this issue by checking (and aligning) the label_map.pbtxt names with the category_names in the tfrecords that were created with object_detection/dataset_tools/create_coco_tf_record.py