Mask R-CNN with Inception Resnet v2, Atrous version; ValueError("Shapes %s and %s are incompatible" % (self, other)) #5769

Closed MaesIT closed 4 years ago

MaesIT commented 5 years ago

System information

Describe the problem

When using the "Mask R-CNN with Inception Resnet v2, Atrous version" config on a custom dataset (not a pretrained model) the training runs well for 10 minutes, but at the evaluation step the process stops with the following error: File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1190, in boolean_mask shape_tensor[axis:axis + ndims_mask].assert_is_compatible_with(shape_mask) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (100, 91) and (300, 91) are incompatible

the complete error trace is shown in the logs below the source code

Dataset: 1000 training images with mask provided (possible multiple per image), 100 evaluation images also with masks provided.

I have used exactly the same dataset to train & eval on a "Mask R-CNN with Inception V2" config, this works fine (already trained for 175k steps & evaluated tons of times). But I would like to train the same data on the inception_resnet_v2 model to see if there is difference in accuracy.

I have also tried running the legacy train.py with the Inception Resnet v2, this works fine, but when I try the legacy eval.py on the trained data it gives me the same error.

Source code

Mask R-CNN with Inception Resnet v2, Atrous version

model { faster_rcnn { num_classes: 90 image_resizer { keep_aspect_ratio_resizer { min_dimension: 800 max_dimension: 1365 } } number_of_stages: 3 feature_extractor { type: 'faster_rcnn_inception_resnet_v2' first_stage_features_stride: 8 } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 8 width_stride: 8 } } first_stage_atrous_rate: 2 first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.7 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 17 maxpool_kernel_size: 1 maxpool_stride: 1 second_stage_box_predictor { mask_rcnn_box_predictor { use_dropout: false dropout_keep_probability: 1.0 predict_instance_masks: true mask_height: 33 mask_width: 33 mask_prediction_conv_depth: 0 mask_prediction_num_conv_layers: 4 fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 second_stage_mask_prediction_loss_weight: 4.0 } }

train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.003 schedule { step: 50000 learning_rate: .0003 } schedule { step: 100000 learning_rate: .00003 } schedule { step: 150000 learning_rate: .000003 } } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 10.0

fine_tune_checkpoint: "object_detection/models/scratches/train/model.ckpt-44.index"

from_detection_checkpoint: true

num_steps: 1000000 data_augmentation_options { random_horizontal_flip { } } }

train_input_reader: { tf_record_input_reader { input_path: "object_detection/data/scratchestrain.record" } label_map_path: "object_detection/data/label_map.pbtxt" load_instance_masks: true mask_type: PNG_MASKS }

eval_config: { num_examples:100

max_evals: 10


eval_input_reader: { tf_record_input_reader { input_path: "object_detection/data/scratcheseval.record" } label_map_path: "object_detection/data/label_map.pbtxt" load_instance_masks: true mask_type: PNG_MASKS shuffle: false num_readers: 1 }


(tf) dietermaes@PCSooi:~/Documents/Tensorflowv2/models/research$ python object_detection/model_main.py --pipeline_config_path=object_detection/models/scratches/mask_rcnn_inception_resnet_v2_atrous_coco.config --model_dir=object_detection/models/scratches/train/ --num_train_steps=1000000 --sample_1_of_n_eval_examples=100 --alsologtostderr /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/utils/visualization_utils.py:27: UserWarning: matplotlib.pyplot as already been imported, this call will have no effect. import matplotlib; matplotlib.use('Agg') # pylint: disable=multiple-statements WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. W1117 12:44:48.994960 140311471769408 tf_logging.py:125] Forced number of epochs for all eval validations to be 1. WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. W1117 12:44:48.995174 140311471769408 tf_logging.py:125] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. WARNING:tensorflow:Estimator's model_fn (<function create_model_fn..model_fn at 0x7f9c27b96e18>) includes params argument, but params are not passed to Estimator. W1117 12:44:48.995556 140311471769408 tf_logging.py:125] Estimator's model_fn (<function create_model_fn..model_fn at 0x7f9c27b96e18>) includes params argument, but params are not passed to Estimator. WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. W1117 12:44:49.015062 140311471769408 tf_logging.py:125] num_readers has been reduced to 1 to match input file shards. WARNING:tensorflow:From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/builders/dataset_builder.py:148: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.batch(..., drop_remainder=True). W1117 12:44:49.702305 140311471769408 tf_logging.py:125] From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/builders/dataset_builder.py:148: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.batch(..., drop_remainder=True). WARNING:tensorflow:From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/predictors/heads/box_head.py:93: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead W1117 12:44:58.801386 140311471769408 tf_logging.py:125] From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/predictors/heads/box_head.py:93: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead WARNING:tensorflow:From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/core/losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

W1117 12:44:59.133028 140311471769408 tf_logging.py:125] From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/core/losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " 2018-11-17 12:45:12.597581: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Traceback (most recent call last): File "object_detection/model_main.py", line 109, in tf.app.run() File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "object_detection/model_main.py", line 105, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate return executor.run() File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 610, in run return self.run_local() File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 711, in run_local saving_listeners=saving_listeners) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1215, in _train_model_default saving_listeners) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1409, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run run_metadata=run_metadata) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1148, in run run_metadata=run_metadata) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1239, in run raise six.reraise(original_exc_info) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1224, in run return self._sess.run(args, kwargs) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1304, in run run_metadata=run_metadata)) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 581, in after_run if self._save(run_context.session, global_step): File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 606, in _save if l.after_save(session, step): File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 517, in after_save self._evaluate(global_step_value) # updates self.eval_result File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 537, in _evaluate self._evaluator.evaluate_and_export()) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 912, in evaluate_and_export hooks=self._eval_spec.hooks) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 476, in evaluate return _evaluate() File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 462, in _evaluate self._evaluate_build_graph(input_fn, hooks, checkpoint_path)) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1422, in _evaluate_build_graph self._call_model_fn_eval(input_fn, self.config)) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1458, in _call_model_fn_eval features, labels, model_fn_lib.ModeKeys.EVAL, config) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn model_fn_results = self._model_fn(features=features, kwargs) File "/home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/model_lib.py", line 307, in model_fn prediction_dict, features[fields.InputDataFields.true_image_shape]) File "/home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1710, in loss groundtruth_masks_list, File "/home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1983, in _loss_box_classifier tf.greater(one_hot_flat_cls_targets_with_background, 0)) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1190, in boolean_mask shape_tensor[axis:axis + ndims_mask].assert_is_compatible_with(shape_mask) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (100, 91) and (300, 91) are incompatible

I'm looking for solutions but have not found them on stackoverflow/here.

I am guessing that during the training images & masks are resized, but that this doesn't happen on the eval data which results in an incompatible shape? Please let me know if you also have encountered this issue + how can it be solved.

Thanks in advance, Dieter Maes

MaesIT commented 5 years ago

Additional info: The issue above is with the tensorflow CPU version. I've tried the same on the tensorflow GPU but unfortunately after 10 minutes I get the same error.

MaesIT commented 5 years ago

I've found the solution, I limited the max detections, but they should have been max 300. Changing the 2 configs below solved the issue: max_detections_per_class: 100 max_total_detections: 100

This issue can be closed.

