tensorflow / models

Models and examples built with TensorFlow
77.16k stars 45.75k forks source link

poor performance with ssd mobilenet v2 QAT retrain #8982

Open wuchichung opened 4 years ago

wuchichung commented 4 years ago


Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using


2. Describe the bug

Precision and recall are low. I just have 1 class in my own dataset. My pretrain model is ssd_mobilenet_v2_quantized_coco. I run the training in CPU as suggested by the Coral tutorial. I try to train a person detector in a crowded scene.

88610941-8a3d7780-d0ba-11ea-88a3-f08712d6530d 88610944-8c073b00-d0ba-11ea-9b79-2daba6d8fe91

Validation matrix during trainning image

3. Steps to reproduce

4. Expected behavior

Better performance

5. Additional context

pipeline config model { ssd { num_classes: 1 image_resizer { fixed_shape_resizer { height: 300 width: 300 } } feature_extractor { type: "ssd_mobilenet_v2" depth_multiplier: 1.0 min_depth: 16 conv_hyperparams { regularizer { l2_regularizer { weight: 3.99999989895e-05 } } initializer { random_normal_initializer { mean: 0.0 stddev: 0.00999999977648 } } activation: RELU_6 batch_norm { decay: 0.97000002861 center: true scale: true epsilon: 0.0010000000475 } } override_base_feature_extractor_hyperparams: true } box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } box_predictor { convolutional_box_predictor { conv_hyperparams { regularizer { l2_regularizer { weight: 3.99999989895e-05 } } initializer { random_normal_initializer { mean: 0.0 stddev: 0.00999999977648 } } activation: RELU_6 batch_norm { decay: 0.97000002861 center: true scale: true epsilon: 0.0010000000475 } } min_depth: 0 max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.800000011921 kernel_size: 1 box_code_size: 4 apply_sigmoid_to_scores: false class_prediction_bias_init: -4.59999990463 } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.20000000298 max_scale: 0.949999988079 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.333299994469 } } post_processing { batch_non_max_suppression { score_threshold: 9.99999993923e-09 iou_threshold: 0.600000023842 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } normalize_loss_by_num_matches: true loss { localization_loss { weighted_smooth_l1 { } } classification_loss { weighted_sigmoid_focal { gamma: 2.0 alpha: 0.75 } } classification_weight: 1.0 localization_weight: 1.0 } encode_background_as_zeros: true normalize_loc_loss_by_codesize: true inplace_batchnorm_update: true freeze_batchnorm: false } } train_config { batch_size: 128 sync_replicas: true optimizer { momentum_optimizer { learning_rate { cosine_decay_learning_rate { learning_rate_base: 0.20000000298 total_steps: 1000 warmup_learning_rate: 0.0599999986589 warmup_steps: 100 } } momentum_optimizer_value: 0.899999976158 } use_moving_average: false } fine_tune_checkpoint: "/pretrain/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt" from_detection_checkpoint: true load_all_detection_checkpoint_vars: true num_steps: 50000 startup_delay_steps: 0.0 replicas_to_aggregate: 8 max_number_of_boxes: 100 unpad_groundtruth_tensors: false freeze_variables: [ 'FeatureExtractor/MobilenetV2/Conv/', 'FeatureExtractor/MobilenetV2/expanded_conv/', 'FeatureExtractor/MobilenetV2/expanded_conv_1/', 'FeatureExtractor/MobilenetV2/expanded_conv_2/', 'FeatureExtractor/MobilenetV2/expanded_conv_3/', 'FeatureExtractor/MobilenetV2/expanded_conv_4/', 'FeatureExtractor/MobilenetV2/expanded_conv_5/', 'FeatureExtractor/MobilenetV2/expanded_conv_6/', 'FeatureExtractor/MobilenetV2/expanded_conv_7/'] } train_input_reader { label_map_path: "/config/label_map.pbtxt" tf_record_input_reader { input_path: "/vol/tf-records/20200727_train.records" } } eval_config { num_examples: 50 metrics_set: "coco_detection_metrics" use_moving_averages: false } eval_input_reader { label_map_path: "/config/label_map.pbtxt" shuffle: false num_readers: 1 tf_record_input_reader { input_path: "/vol/tf-records/20200727_val.records" } } graph_rewriter { quantization { delay: 0 weight_bits: 8 activation_bits: 8 } }

Include any logs that would be helpful to diagnose the problem.

6. System information

kylelindgren commented 4 years ago

I solved this issue by checking (and aligning) the label_map.pbtxt names with the category_names in the tfrecords that were created with object_detection/dataset_tools/create_coco_tf_record.py