tensorflow / models

Models and examples built with TensorFlow
Other
77.01k stars 45.78k forks source link

BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Shape_2/_4377 while training FRCNN Resnet50 on Object Detection #8747

Open ravising-h opened 4 years ago

ravising-h commented 4 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/...

2. Describe the bug

I am using Object Detection for training single class FRCNN Resnet 50 it. It throws very wierd error.

Config File

# Faster R-CNN with Resnet-50 (v1), configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 1
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet50'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 8
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "/content/models/research/pretrained_model/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 5000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/content/models/research/train.record"
  }
  label_map_path: "/content/models/research/object-detection.pbtxt"
}

eval_config: {
  num_examples: 8000
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/content/models/research/eval.record"
  }
  label_map_path: "/content/models/research/object-detection.pbtxt"
  shuffle: false
  num_readers: 1
}

Error

WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0629 06:37:30.500078 140055571580800 model_lib.py:717] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting train_steps: 5000
I0629 06:37:30.500358 140055571580800 config_util.py:552] Maybe overwriting train_steps: 5000
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0629 06:37:30.500550 140055571580800 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1
I0629 06:37:30.500717 140055571580800 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: 1
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0629 06:37:30.500879 140055571580800 config_util.py:552] Maybe overwriting eval_num_epochs: 1
INFO:tensorflow:Maybe overwriting load_pretrained: True
I0629 06:37:30.501038 140055571580800 config_util.py:552] Maybe overwriting load_pretrained: True
INFO:tensorflow:Ignoring config override key: load_pretrained
I0629 06:37:30.501193 140055571580800 config_util.py:562] Ignoring config override key: load_pretrained
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0629 06:37:30.502254 140055571580800 model_lib.py:733] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu False
I0629 06:37:30.502461 140055571580800 model_lib.py:768] create_estimator_and_inputs: use_tpu False, export_to_tpu False
INFO:tensorflow:Using config: {'_model_dir': 'training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f60db5ed710>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0629 06:37:30.503009 140055571580800 estimator.py:212] Using config: {'_model_dir': 'training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f60db5ed710>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f60c1b0c950>) includes params argument, but params are not passed to Estimator.
W0629 06:37:30.503304 140055571580800 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f60c1b0c950>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Not using Distribute Coordinator.
I0629 06:37:30.504147 140055571580800 estimator_training.py:186] Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
I0629 06:37:30.504416 140055571580800 training.py:612] Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
I0629 06:37:30.504807 140055571580800 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0629 06:37:30.511681 140055571580800 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0629 06:37:30.555287 140055571580800 dataset_builder.py:83] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0629 06:37:30.561928 140055571580800 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0629 06:37:30.586159 140055571580800 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /content/models/research/object_detection/inputs.py:76: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0629 06:37:44.126361 140055571580800 deprecation.py:323] From /content/models/research/object_detection/inputs.py:76: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /content/models/research/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0629 06:37:44.254202 140055571580800 deprecation.py:323] From /content/models/research/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /content/models/research/object_detection/inputs.py:258: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0629 06:37:50.130385 140055571580800 deprecation.py:323] From /content/models/research/object_detection/inputs.py:258: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
INFO:tensorflow:Calling model_fn.
I0629 06:37:55.138738 140055571580800 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Scale of 0 disables regularizer.
I0629 06:37:55.323598 140055571580800 regularizers.py:99] Scale of 0 disables regularizer.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0629 06:37:55.326330 140055571580800 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
INFO:tensorflow:Scale of 0 disables regularizer.
I0629 06:37:57.659181 140055571580800 regularizers.py:99] Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
I0629 06:37:57.677415 140055571580800 regularizers.py:99] Scale of 0 disables regularizer.
INFO:tensorflow:depth of additional conv before box predictor: 0
I0629 06:37:57.677852 140055571580800 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
WARNING:tensorflow:From /content/models/research/object_detection/utils/spatial_transform_ops.py:428: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
W0629 06:38:03.614139 140055571580800 deprecation.py:506] From /content/models/research/object_detection/utils/spatial_transform_ops.py:428: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
INFO:tensorflow:Scale of 0 disables regularizer.
I0629 06:38:03.645871 140055571580800 regularizers.py:99] Scale of 0 disables regularizer.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py:1666: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
W0629 06:38:04.029824 140055571580800 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py:1666: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
INFO:tensorflow:Scale of 0 disables regularizer.
I0629 06:38:04.032702 140055571580800 regularizers.py:99] Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
I0629 06:38:04.054920 140055571580800 regularizers.py:99] Scale of 0 disables regularizer.
WARNING:tensorflow:From /content/models/research/object_detection/core/losses.py:347: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

W0629 06:38:07.732958 140055571580800 deprecation.py:323] From /content/models/research/object_detection/core/losses.py:347: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:tensorflow:Done calling model_fn.
I0629 06:38:15.065322 140055571580800 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0629 06:38:15.067013 140055571580800 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0629 06:38:18.693901 140055571580800 monitored_session.py:240] Graph was finalized.
2020-06-29 06:38:18.694362: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-29 06:38:18.699435: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2020-06-29 06:38:18.699726: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1864e8c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-29 06:38:18.699768: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-29 06:38:18.701944: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-29 06:38:18.772193: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-29 06:38:18.773121: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1864e700 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-29 06:38:18.773160: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2020-06-29 06:38:18.773410: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-29 06:38:18.774209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2020-06-29 06:38:18.774630: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-06-29 06:38:18.775996: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-06-29 06:38:18.777187: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-06-29 06:38:18.777670: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-06-29 06:38:18.779223: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-06-29 06:38:18.780525: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-06-29 06:38:18.784183: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-29 06:38:18.784351: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-29 06:38:18.785195: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-29 06:38:18.785934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-06-29 06:38:18.786026: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-06-29 06:38:18.787699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-29 06:38:18.787739: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-06-29 06:38:18.787760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-06-29 06:38:18.787952: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-29 06:38:18.788793: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-29 06:38:18.789557: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-06-29 06:38:18.789625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
INFO:tensorflow:Running local_init_op.
I0629 06:38:22.410002 140055571580800 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0629 06:38:22.785913 140055571580800 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into training/model.ckpt.
I0629 06:38:33.962032 140055571580800 basic_session_run_hooks.py:606] Saving checkpoints for 0 into training/model.ckpt.
2020-06-29 06:38:43.237604: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: {{function_node Dataset_map_transform_and_pad_input_data_fn_422}} assertion failed: [[1]] [[0]]
     [[{{node Assert/AssertGuard/Assert}}]]
     [[IteratorGetNext]]
     [[BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Shape_2/_4377]]
  (1) Invalid argument: {{function_node Dataset_map_transform_and_pad_input_data_fn_422}} assertion failed: [[1]] [[0]]
     [[{{node Assert/AssertGuard/Assert}}]]
     [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/models/research/object_detection/model_main.py", line 114, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/content/models/research/object_detection/model_main.py", line 110, in main
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1494, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  assertion failed: [[1]] [[0]]
     [[{{node Assert/AssertGuard/Assert}}]]
     [[IteratorGetNext]]
     [[BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Shape_2/_4377]]
  (1) Invalid argument:  assertion failed: [[1]] [[0]]
     [[{{node Assert/AssertGuard/Assert}}]]
     [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.

3. Steps to reproduce

https://colab.research.google.com/drive/1o7JB0pWanEMn6qnRnEXphu0T4YbuKLL2#scrollTo=PFR1cVakmG-4 Use this colab add 'frcnn_resnet50':{ 'pipeline_file':'faster_rcnn_resnet50_coco.config', 'model_name':'faster_rcnn_resnet50_coco_2018_01_28', 'batch_size': 8 } in 1st cell in Model_config

4. Expected behavior

It should start training

5. Additional context

It works fine for SSD problem arises while using FRCNN - Resnet

6. System information

colab with K80 Tesla

ravising-h commented 4 years ago

Any updates?

justinkay commented 4 years ago

I am also seeing this error with tf 1.15, run using model_main.py

Pipeline file:

model {
  faster_rcnn {
    num_classes: 1
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  num_steps: 472050
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 314700
            learning_rate: .00003
          }
          schedule {
            step: 419600
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "models/tf_pretrained/faster_rcnn_resnet101_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "data/v012/tf/train-?????-of-00009"
  }
  label_map_path: "data/v012/labels/tf/labelmap.pbtxt"
}

eval_config: {
  num_examples: 8762
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  # max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "data/v012/tf/val-?????-of-00003"
  }
  label_map_path: "data/v012/labels/tf/labelmap.pbtxt"
  shuffle: false
  num_readers: 1
}

Note my model_main.py has been modified to accept --save_checkpoints_steps for this call:

def main(unused_argv):
    ...
    config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir, save_checkpoints_steps=FLAGS.save_checkpoints_steps)
    ...

Log:

$ python lib/models/research/object_detection/model_main.py --pipeline_config_path=${PIPELINE_CONFIG_PATH} --model_dir=${MODEL_DIR}     --num_train_steps=${NUM_TRAIN_STEPS} --save_checkpoints_steps=26225 --alsologtostderr
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0804 20:22:15.608931 140157722805632 model_lib.py:758] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting train_steps: 472050
I0804 20:22:15.609210 140157722805632 config_util.py:552] Maybe overwriting train_steps: 472050
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0804 20:22:15.609313 140157722805632 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1
I0804 20:22:15.609401 140157722805632 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: 1
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0804 20:22:15.609490 140157722805632 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0804 20:22:15.609631 140157722805632 model_lib.py:774] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu None
I0804 20:22:15.609750 140157722805632 model_lib.py:809] create_estimator_and_inputs: use_tpu False, export_to_tpu None
INFO:tensorflow:Using config: {'_model_dir': 'tf-results', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 26225, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f7894815790>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0804 20:22:15.610160 140157722805632 estimator.py:212] Using config: {'_model_dir': 'tf-results', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 26225, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f7894815790>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f78948145f0>) includes params argument, but params are not passed to Estimator.
W0804 20:22:15.610489 140157722805632 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f78948145f0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Not using Distribute Coordinator.
I0804 20:22:15.610882 140157722805632 estimator_training.py:186] Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
I0804 20:22:15.611066 140157722805632 training.py:612] Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 26225 or save_checkpoints_secs None.
I0804 20:22:15.611297 140157722805632 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 26225 or save_checkpoints_secs None.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0804 20:22:15.616826 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:num_readers has been reduced to 9 to match input file shards.
W0804 20:22:15.645005 140157722805632 dataset_builder.py:83] num_readers has been reduced to 9 to match input file shards.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0804 20:22:15.650861 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0804 20:22:15.673810 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/inputs.py:77: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0804 20:22:28.440482 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/inputs.py:77: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0804 20:22:28.560776 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0804 20:22:34.365995 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
INFO:tensorflow:Calling model_fn.
I0804 20:22:39.074822 140157722805632 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:39.111900 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0804 20:22:39.114423 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:42.895321 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:42.913479 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
INFO:tensorflow:depth of additional conv before box predictor: 0
I0804 20:22:42.913854 140157722805632 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/utils/spatial_transform_ops.py:478: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
W0804 20:22:43.784451 140157722805632 deprecation.py:506] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/utils/spatial_transform_ops.py:478: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:43.799534 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/tf_slim/layers/layers.py:1666: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
W0804 20:22:44.139258 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/tf_slim/layers/layers.py:1666: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:44.142145 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:44.162720 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/core/losses.py:347: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

W0804 20:22:44.523350 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/core/losses.py:347: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:tensorflow:Done calling model_fn.
I0804 20:22:51.249653 140157722805632 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0804 20:22:51.251239 140157722805632 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0804 20:22:54.445739 140157722805632 monitored_session.py:240] Graph was finalized.
2020-08-04 20:22:54.446140: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-08-04 20:22:54.453311: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2020-08-04 20:22:54.453836: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x558834574de0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-04 20:22:54.453870: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-04 20:22:54.455126: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-08-04 20:22:56.446505: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.447287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2020-08-04 20:22:56.447611: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-08-04 20:22:56.448852: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-08-04 20:22:56.450213: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-08-04 20:22:56.450535: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-08-04 20:22:56.452081: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-08-04 20:22:56.453235: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-08-04 20:22:56.456886: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-04 20:22:56.457027: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.457926: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.458622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-08-04 20:22:56.458680: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-08-04 20:22:56.505033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-04 20:22:56.505072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-08-04 20:22:56.505096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-08-04 20:22:56.505329: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.506176: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.507020: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.507852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2020-08-04 20:22:56.510149: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55883531af30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-04 20:22:56.510177: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
INFO:tensorflow:Running local_init_op.
I0804 20:23:01.691994 140157722805632 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0804 20:23:01.994730 140157722805632 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into tf-results/model.ckpt.
I0804 20:23:10.123725 140157722805632 basic_session_run_hooks.py:606] Saving checkpoints for 0 into tf-results/model.ckpt.
2020-08-04 20:23:18.682539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
Traceback (most recent call last):
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: {{function_node Dataset_map_transform_and_pad_input_data_fn_422}} Indices are not valid (out of bounds).  Shape: [1]
     [[{{node cond/SparseToDense}}]]
     [[IteratorGetNext]]
     [[BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/NonMaxSuppressionV5/_5611]]
  (1) Invalid argument: {{function_node Dataset_map_transform_and_pad_input_data_fn_422}} Indices are not valid (out of bounds).  Shape: [1]
     [[{{node cond/SparseToDense}}]]
     [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "lib/models/research/object_detection/model_main.py", line 111, in <module>
    tf.app.run()
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "lib/models/research/object_detection/model_main.py", line 107, in main
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1494, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
    run_metadata=run_metadata)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Indices are not valid (out of bounds).  Shape: [1]
     [[{{node cond/SparseToDense}}]]
     [[IteratorGetNext]]
     [[BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/NonMaxSuppressionV5/_5611]]
  (1) Invalid argument:  Indices are not valid (out of bounds).  Shape: [1]
     [[{{node cond/SparseToDense}}]]
     [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.