tensorflow / models

Models and examples built with TensorFlow
Other
77.01k stars 45.78k forks source link

Mask R-CNN with Inception Resnet v2, Atrous version; ValueError("Shapes %s and %s are incompatible" % (self, other)) #5769

Closed MaesIT closed 4 years ago

MaesIT commented 5 years ago

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

Describe the problem

When using the "Mask R-CNN with Inception Resnet v2, Atrous version" config on a custom dataset (not a pretrained model) the training runs well for 10 minutes, but at the evaluation step the process stops with the following error: File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1190, in boolean_mask shape_tensor[axis:axis + ndims_mask].assert_is_compatible_with(shape_mask) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (100, 91) and (300, 91) are incompatible

the complete error trace is shown in the logs below the source code

Dataset: 1000 training images with mask provided (possible multiple per image), 100 evaluation images also with masks provided.

I have used exactly the same dataset to train & eval on a "Mask R-CNN with Inception V2" config, this works fine (already trained for 175k steps & evaluated tons of times). But I would like to train the same data on the inception_resnet_v2 model to see if there is difference in accuracy.

I have also tried running the legacy train.py with the Inception Resnet v2, this works fine, but when I try the legacy eval.py on the trained data it gives me the same error.

Source code

Mask R-CNN with Inception Resnet v2, Atrous version

model { faster_rcnn { num_classes: 90 image_resizer { keep_aspect_ratio_resizer { min_dimension: 800 max_dimension: 1365 } } number_of_stages: 3 feature_extractor { type: 'faster_rcnn_inception_resnet_v2' first_stage_features_stride: 8 } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 8 width_stride: 8 } } first_stage_atrous_rate: 2 first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.7 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 17 maxpool_kernel_size: 1 maxpool_stride: 1 second_stage_box_predictor { mask_rcnn_box_predictor { use_dropout: false dropout_keep_probability: 1.0 predict_instance_masks: true mask_height: 33 mask_width: 33 mask_prediction_conv_depth: 0 mask_prediction_num_conv_layers: 4 fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 second_stage_mask_prediction_loss_weight: 4.0 } }

train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.003 schedule { step: 50000 learning_rate: .0003 } schedule { step: 100000 learning_rate: .00003 } schedule { step: 150000 learning_rate: .000003 } } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 10.0

fine_tune_checkpoint: "object_detection/models/scratches/train/model.ckpt-44.index"

from_detection_checkpoint: true

num_steps: 1000000 data_augmentation_options { random_horizontal_flip { } } }

train_input_reader: { tf_record_input_reader { input_path: "object_detection/data/scratchestrain.record" } label_map_path: "object_detection/data/label_map.pbtxt" load_instance_masks: true mask_type: PNG_MASKS }

eval_config: { num_examples:100

max_evals: 10

}

eval_input_reader: { tf_record_input_reader { input_path: "object_detection/data/scratcheseval.record" } label_map_path: "object_detection/data/label_map.pbtxt" load_instance_masks: true mask_type: PNG_MASKS shuffle: false num_readers: 1 }

Logs

(tf) dietermaes@PCSooi:~/Documents/Tensorflowv2/models/research$ python object_detection/model_main.py --pipeline_config_path=object_detection/models/scratches/mask_rcnn_inception_resnet_v2_atrous_coco.config --model_dir=object_detection/models/scratches/train/ --num_train_steps=1000000 --sample_1_of_n_eval_examples=100 --alsologtostderr /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/utils/visualization_utils.py:27: UserWarning: matplotlib.pyplot as already been imported, this call will have no effect. import matplotlib; matplotlib.use('Agg') # pylint: disable=multiple-statements WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. W1117 12:44:48.994960 140311471769408 tf_logging.py:125] Forced number of epochs for all eval validations to be 1. WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. W1117 12:44:48.995174 140311471769408 tf_logging.py:125] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. WARNING:tensorflow:Estimator's model_fn (<function create_model_fn..model_fn at 0x7f9c27b96e18>) includes params argument, but params are not passed to Estimator. W1117 12:44:48.995556 140311471769408 tf_logging.py:125] Estimator's model_fn (<function create_model_fn..model_fn at 0x7f9c27b96e18>) includes params argument, but params are not passed to Estimator. WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. W1117 12:44:49.015062 140311471769408 tf_logging.py:125] num_readers has been reduced to 1 to match input file shards. WARNING:tensorflow:From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/builders/dataset_builder.py:148: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.batch(..., drop_remainder=True). W1117 12:44:49.702305 140311471769408 tf_logging.py:125] From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/builders/dataset_builder.py:148: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.batch(..., drop_remainder=True). WARNING:tensorflow:From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/predictors/heads/box_head.py:93: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead W1117 12:44:58.801386 140311471769408 tf_logging.py:125] From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/predictors/heads/box_head.py:93: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead WARNING:tensorflow:From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/core/losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

W1117 12:44:59.133028 140311471769408 tf_logging.py:125] From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/core/losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " 2018-11-17 12:45:12.597581: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Traceback (most recent call last): File "object_detection/model_main.py", line 109, in tf.app.run() File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "object_detection/model_main.py", line 105, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate return executor.run() File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 610, in run return self.run_local() File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 711, in run_local saving_listeners=saving_listeners) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1215, in _train_model_default saving_listeners) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1409, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run run_metadata=run_metadata) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1148, in run run_metadata=run_metadata) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1239, in run raise six.reraise(original_exc_info) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1224, in run return self._sess.run(args, kwargs) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1304, in run run_metadata=run_metadata)) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 581, in after_run if self._save(run_context.session, global_step): File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 606, in _save if l.after_save(session, step): File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 517, in after_save self._evaluate(global_step_value) # updates self.eval_result File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 537, in _evaluate self._evaluator.evaluate_and_export()) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 912, in evaluate_and_export hooks=self._eval_spec.hooks) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 476, in evaluate return _evaluate() File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 462, in _evaluate self._evaluate_build_graph(input_fn, hooks, checkpoint_path)) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1422, in _evaluate_build_graph self._call_model_fn_eval(input_fn, self.config)) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1458, in _call_model_fn_eval features, labels, model_fn_lib.ModeKeys.EVAL, config) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn model_fn_results = self._model_fn(features=features, kwargs) File "/home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/model_lib.py", line 307, in model_fn prediction_dict, features[fields.InputDataFields.true_image_shape]) File "/home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1710, in loss groundtruth_masks_list, File "/home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1983, in _loss_box_classifier tf.greater(one_hot_flat_cls_targets_with_background, 0)) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1190, in boolean_mask shape_tensor[axis:axis + ndims_mask].assert_is_compatible_with(shape_mask) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (100, 91) and (300, 91) are incompatible

I'm looking for solutions but have not found them on stackoverflow/here.

I am guessing that during the training images & masks are resized, but that this doesn't happen on the eval data which results in an incompatible shape? Please let me know if you also have encountered this issue + how can it be solved.

Thanks in advance, Dieter Maes

MaesIT commented 5 years ago

Additional info: The issue above is with the tensorflow CPU version. I've tried the same on the tensorflow GPU but unfortunately after 10 minutes I get the same error.

MaesIT commented 5 years ago

I've found the solution, I limited the max detections, but they should have been max 300. Changing the 2 configs below solved the issue: max_detections_per_class: 100 max_total_detections: 100

This issue can be closed.

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.