Closed guijaci closed 4 years ago
try to run from classification checkpoint
from_detection_checkpoint: false
So, I solved the issue:
Now I can properly train the model. Thing is, I don't really know if I'm doing transfer learn now. What happens if the _checkpointdir points to empty folder (where I want to keep the new dataset checkpoints) and the _fine_tunecheckpoint to where the pre trained variables are?
Prerequisites
Please answer the following questions for yourself before submitting an issue.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/tree/master/research/object_detection/model_main.py
2. Describe the bug
I'm trying to transfer learn a mobilenet with ssd model using the object detection API for another dataset with 8 classes. I'm following the Running Locally and the Using your own Dataset tutorial. After configuring the TF record, running model_main.py yields:
Those are from the last layers, as I inspected the model. My take is that the script is trying to restore the detection layers, even though the number of classes on the checkpoint is different and I set in pipeline.config the _trainconfig option:
load_all_detection_checkpoint_vars: false
What reinforces this hypothesis is that the shape mentioned in the error is related to the number of classes: My dataset = 8 classes 54 = (8+1)*3*2 COCO dataset = 90 classes 546 = (90+1)*3*2
Sometimes the log changes, showing higher dimensions layers, or multiples of those numbers. But the problem generally is around the number axis being assigned like 54 << 546 and 27 << 273. When I change _numclasses in pipeline.config, it follows this pattern.
3. Steps to reproduce
4. Expected behavior
I expected the script to not yield exception because of the number of classes when the option in _trainconfig inside pipeline.config is:
load_all_detection_checkpoint_vars: false
5. Additional context
5.1 Colab location
https://colab.research.google.com/drive/1GpxZD3ORxIuaOUQDBjO3V_4fP6elylHM?usp=sharing
5.2 Tested models:
ssd_mobilenet_v2_coco_2018_03_29 ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03 ssdlite_mobilenet_v2_coco_2018_05_09 ssd_mobilenet_v3_large_coco_2020_01_14 ssd_mobilenet_v3_small_coco_2020_01_14 ssd_inception_v2_coco_2018_01_28
5.3 Directory Structure:
5.4 Call to model_main.py:
output
``` WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. W0619 20:55:39.260014 140535217379200 model_lib.py:717] Forced number of epochs for all eval validations to be 1. INFO:tensorflow:Maybe overwriting train_steps: None I0619 20:55:39.260228 140535217379200 config_util.py:523] Maybe overwriting train_steps: None INFO:tensorflow:Maybe overwriting use_bfloat16: False I0619 20:55:39.260316 140535217379200 config_util.py:523] Maybe overwriting use_bfloat16: False INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1 I0619 20:55:39.260397 140535217379200 config_util.py:523] Maybe overwriting sample_1_of_n_eval_examples: 1 INFO:tensorflow:Maybe overwriting eval_num_epochs: 1 I0619 20:55:39.260483 140535217379200 config_util.py:523] Maybe overwriting eval_num_epochs: 1 INFO:tensorflow:Maybe overwriting load_pretrained: True I0619 20:55:39.260560 140535217379200 config_util.py:523] Maybe overwriting load_pretrained: True INFO:tensorflow:Ignoring config override key: load_pretrained I0619 20:55:39.260628 140535217379200 config_util.py:533] Ignoring config override key: load_pretrained WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1. W0619 20:55:39.261385 140535217379200 model_lib.py:733] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1. INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu False I0619 20:55:39.261496 140535217379200 model_lib.py:768] create_estimator_and_inputs: use_tpu False, export_to_tpu False INFO:tensorflow:Using config: {'_model_dir': 'model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec':5.5 Example pipeline.config:
For ssd_mobilenet_v2_coco_2018_03_29
pipeline.config
``` model { ssd { num_classes: 8 image_resizer { fixed_shape_resizer { height: 300 width: 300 } } feature_extractor { type: "ssd_mobilenet_v2" depth_multiplier: 1.0 min_depth: 16 conv_hyperparams { regularizer { l2_regularizer { weight: 3.99999989895e-05 } } initializer { truncated_normal_initializer { mean: 0.0 stddev: 0.0299999993294 } } activation: RELU_6 batch_norm { decay: 0.999700009823 center: true scale: true epsilon: 0.0010000000475 train: true } } use_depthwise: true } box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true } } similarity_calculator { iou_similarity { } } box_predictor { convolutional_box_predictor { conv_hyperparams { regularizer { l2_regularizer { weight: 3.99999989895e-05 } } initializer { truncated_normal_initializer { mean: 0.0 stddev: 0.0299999993294 } } activation: RELU_6 batch_norm { decay: 0.999700009823 center: true scale: true epsilon: 0.0010000000475 train: true } } min_depth: 0 max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.800000011921 kernel_size: 3 box_code_size: 4 apply_sigmoid_to_scores: false } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.20000000298 max_scale: 0.949999988079 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.333299994469 } } post_processing { batch_non_max_suppression { score_threshold: 0.300000011921 iou_threshold: 0.600000023842 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } normalize_loss_by_num_matches: true loss { localization_loss { weighted_smooth_l1 { } } classification_loss { weighted_sigmoid { } } hard_example_miner { num_hard_examples: 3000 iou_threshold: 0.990000009537 loss_type: CLASSIFICATION max_negatives_per_positive: 3 min_negatives_per_image: 3 } classification_weight: 1.0 localization_weight: 1.0 } } } train_config { batch_size: 24 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } optimizer { rms_prop_optimizer { learning_rate { exponential_decay_learning_rate { initial_learning_rate: 0.00400000018999 decay_steps: 800720 decay_factor: 0.949999988079 } } momentum_optimizer_value: 0.899999976158 decay: 0.899999976158 epsilon: 1.0 } } fine_tune_checkpoint: "model/model.ckpt" load_all_detection_checkpoint_vars: false from_detection_checkpoint: true num_steps: 200000 fine_tune_checkpoint_type: "detection" } train_input_reader: { tf_record_input_reader { input_path: "data/train.record" } label_map_path: "annotations/label_map.pbtxt" } eval_config: { num_examples: 1188 # Note: The below line limits the evaluation process to 10 evaluations. # Remove the below line to evaluate indefinitely. max_evals: 10 use_moving_averages: false } eval_input_reader: { tf_record_input_reader { input_path: "data/val.record" } label_map_path: "annotations/label_map.pbtxt" shuffle: false num_readers: 1 } ```5.6 Before you ask...
skipping training since max_steps saved
6. System information