Closed MaesIT closed 4 years ago
Additional info: The issue above is with the tensorflow CPU version. I've tried the same on the tensorflow GPU but unfortunately after 10 minutes I get the same error.
I've found the solution, I limited the max detections, but they should have been max 300. Changing the 2 configs below solved the issue: max_detections_per_class: 100 max_total_detections: 100
This issue can be closed.
Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
Please go to Stack Overflow for help and support:
http://stackoverflow.com/questions/tagged/tensorflow
Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:
Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information
What is the top-level directory of the model you are using:models/research/
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
TensorFlow installed from (source or binary): Source
TensorFlow version (use command below): 1.11.0
Bazel version (if compiling from source): /
CUDA/cuDNN version: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85
GPU model and memory: GeForce GTX 1080 (12gb) Memory: total used free shared buff/cache available Mem: 32084 2966 24809 96 4307 28565 Swap: 2047 1850 197
Exact command to reproduce: python object_detection/model_main.py --pipeline_config_path=object_detection/models/scratches/mask_rcnn_inception_resnet_v2_atrous_coco.config --model_dir=object_detection/models/scratches/train/ --num_train_steps=1000000 --sample_1_of_n_eval_examples=100 --alsologtostderr
Describe the problem
When using the "Mask R-CNN with Inception Resnet v2, Atrous version" config on a custom dataset (not a pretrained model) the training runs well for 10 minutes, but at the evaluation step the process stops with the following error: File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1190, in boolean_mask shape_tensor[axis:axis + ndims_mask].assert_is_compatible_with(shape_mask) File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (100, 91) and (300, 91) are incompatible
the complete error trace is shown in the logs below the source code
Dataset: 1000 training images with mask provided (possible multiple per image), 100 evaluation images also with masks provided.
I have used exactly the same dataset to train & eval on a "Mask R-CNN with Inception V2" config, this works fine (already trained for 175k steps & evaluated tons of times). But I would like to train the same data on the inception_resnet_v2 model to see if there is difference in accuracy.
I have also tried running the legacy train.py with the Inception Resnet v2, this works fine, but when I try the legacy eval.py on the trained data it gives me the same error.
Source code
Mask R-CNN with Inception Resnet v2, Atrous version
model { faster_rcnn { num_classes: 90 image_resizer { keep_aspect_ratio_resizer { min_dimension: 800 max_dimension: 1365 } } number_of_stages: 3 feature_extractor { type: 'faster_rcnn_inception_resnet_v2' first_stage_features_stride: 8 } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 8 width_stride: 8 } } first_stage_atrous_rate: 2 first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.7 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 17 maxpool_kernel_size: 1 maxpool_stride: 1 second_stage_box_predictor { mask_rcnn_box_predictor { use_dropout: false dropout_keep_probability: 1.0 predict_instance_masks: true mask_height: 33 mask_width: 33 mask_prediction_conv_depth: 0 mask_prediction_num_conv_layers: 4 fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 second_stage_mask_prediction_loss_weight: 4.0 } }
train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.003 schedule { step: 50000 learning_rate: .0003 } schedule { step: 100000 learning_rate: .00003 } schedule { step: 150000 learning_rate: .000003 } } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "object_detection/models/scratches/train/model.ckpt-44.index"
from_detection_checkpoint: true
num_steps: 1000000 data_augmentation_options { random_horizontal_flip { } } }
train_input_reader: { tf_record_input_reader { input_path: "object_detection/data/scratchestrain.record" } label_map_path: "object_detection/data/label_map.pbtxt" load_instance_masks: true mask_type: PNG_MASKS }
eval_config: { num_examples:100
max_evals: 10
}
eval_input_reader: { tf_record_input_reader { input_path: "object_detection/data/scratcheseval.record" } label_map_path: "object_detection/data/label_map.pbtxt" load_instance_masks: true mask_type: PNG_MASKS shuffle: false num_readers: 1 }
Logs
(tf) dietermaes@PCSooi:~/Documents/Tensorflowv2/models/research$ python object_detection/model_main.py --pipeline_config_path=object_detection/models/scratches/mask_rcnn_inception_resnet_v2_atrous_coco.config --model_dir=object_detection/models/scratches/train/ --num_train_steps=1000000 --sample_1_of_n_eval_examples=100 --alsologtostderr /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/utils/visualization_utils.py:27: UserWarning: matplotlib.pyplot as already been imported, this call will have no effect. import matplotlib; matplotlib.use('Agg') # pylint: disable=multiple-statements WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. W1117 12:44:48.994960 140311471769408 tf_logging.py:125] Forced number of epochs for all eval validations to be 1. WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered.model_fn at 0x7f9c27b96e18>) includes params argument, but params are not passed to Estimator.
W1117 12:44:48.995556 140311471769408 tf_logging.py:125] Estimator's model_fn (<function create_model_fn..model_fn at 0x7f9c27b96e18>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W1117 12:44:49.015062 140311471769408 tf_logging.py:125] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/builders/dataset_builder.py:148: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use
eval_on_train_input_config.num_epochs
= 0. Overwritingnum_epochs
to 1. W1117 12:44:48.995174 140311471769408 tf_logging.py:125] Expected number of evaluation epochs is 1, but instead encounteredeval_on_train_input_config.num_epochs
= 0. Overwritingnum_epochs
to 1. WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.tf.data.Dataset.batch(..., drop_remainder=True)
. W1117 12:44:49.702305 140311471769408 tf_logging.py:125] From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/builders/dataset_builder.py:148: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Usetf.data.Dataset.batch(..., drop_remainder=True)
. WARNING:tensorflow:From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/predictors/heads/box_head.py:93: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead W1117 12:44:58.801386 140311471769408 tf_logging.py:125] From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/predictors/heads/box_head.py:93: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead WARNING:tensorflow:From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/core/losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.
See
tf.nn.softmax_cross_entropy_with_logits_v2
.W1117 12:44:59.133028 140311471769408 tf_logging.py:125] From /home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/core/losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.
See
tf.nn.softmax_cross_entropy_with_logits_v2
./home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " 2018-11-17 12:45:12.597581: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Traceback (most recent call last): File "object_detection/model_main.py", line 109, in
tf.app.run()
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "object_detection/model_main.py", line 105, in main
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 610, in run
return self.run_local()
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1215, in _train_model_default
saving_listeners)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1409, in _train_with_estimatorspec
, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run
run_metadata=run_metadata)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1148, in run
run_metadata=run_metadata)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1239, in run
raise six.reraise(original_exc_info)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1224, in run
return self._sess.run(args, kwargs)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1304, in run
run_metadata=run_metadata))
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 581, in after_run
if self._save(run_context.session, global_step):
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 606, in _save
if l.after_save(session, step):
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 517, in after_save
self._evaluate(global_step_value) # updates self.eval_result
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 537, in _evaluate
self._evaluator.evaluate_and_export())
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 912, in evaluate_and_export
hooks=self._eval_spec.hooks)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 476, in evaluate
return _evaluate()
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 462, in _evaluate
self._evaluate_build_graph(input_fn, hooks, checkpoint_path))
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1422, in _evaluate_build_graph
self._call_model_fn_eval(input_fn, self.config))
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1458, in _call_model_fn_eval
features, labels, model_fn_lib.ModeKeys.EVAL, config)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn
model_fn_results = self._model_fn(features=features, kwargs)
File "/home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/model_lib.py", line 307, in model_fn
prediction_dict, features[fields.InputDataFields.true_image_shape])
File "/home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1710, in loss
groundtruth_masks_list,
File "/home/dietermaes/Documents/Tensorflowv2/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1983, in _loss_box_classifier
tf.greater(one_hot_flat_cls_targets_with_background, 0))
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1190, in boolean_mask
shape_tensor[axis:axis + ndims_mask].assert_is_compatible_with(shape_mask)
File "/home/dietermaes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (100, 91) and (300, 91) are incompatible
I'm looking for solutions but have not found them on stackoverflow/here.
I am guessing that during the training images & masks are resized, but that this doesn't happen on the eval data which results in an incompatible shape? Please let me know if you also have encountered this issue + how can it be solved.
Thanks in advance, Dieter Maes