Open ravising-h opened 4 years ago
Any updates?
I am also seeing this error with tf 1.15, run using model_main.py
Pipeline file:
model {
faster_rcnn {
num_classes: 1
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
num_steps: 472050
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 314700
learning_rate: .00003
}
schedule {
step: 419600
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "models/tf_pretrained/faster_rcnn_resnet101_coco_2018_01_28/model.ckpt"
from_detection_checkpoint: true
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "data/v012/tf/train-?????-of-00009"
}
label_map_path: "data/v012/labels/tf/labelmap.pbtxt"
}
eval_config: {
num_examples: 8762
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
# max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "data/v012/tf/val-?????-of-00003"
}
label_map_path: "data/v012/labels/tf/labelmap.pbtxt"
shuffle: false
num_readers: 1
}
Note my model_main.py has been modified to accept --save_checkpoints_steps for this call:
def main(unused_argv):
...
config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir, save_checkpoints_steps=FLAGS.save_checkpoints_steps)
...
Log:
$ python lib/models/research/object_detection/model_main.py --pipeline_config_path=${PIPELINE_CONFIG_PATH} --model_dir=${MODEL_DIR} --num_train_steps=${NUM_TRAIN_STEPS} --save_checkpoints_steps=26225 --alsologtostderr
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0804 20:22:15.608931 140157722805632 model_lib.py:758] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting train_steps: 472050
I0804 20:22:15.609210 140157722805632 config_util.py:552] Maybe overwriting train_steps: 472050
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0804 20:22:15.609313 140157722805632 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1
I0804 20:22:15.609401 140157722805632 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: 1
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0804 20:22:15.609490 140157722805632 config_util.py:552] Maybe overwriting eval_num_epochs: 1
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0804 20:22:15.609631 140157722805632 model_lib.py:774] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu None
I0804 20:22:15.609750 140157722805632 model_lib.py:809] create_estimator_and_inputs: use_tpu False, export_to_tpu None
INFO:tensorflow:Using config: {'_model_dir': 'tf-results', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 26225, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f7894815790>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0804 20:22:15.610160 140157722805632 estimator.py:212] Using config: {'_model_dir': 'tf-results', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 26225, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f7894815790>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f78948145f0>) includes params argument, but params are not passed to Estimator.
W0804 20:22:15.610489 140157722805632 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f78948145f0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Not using Distribute Coordinator.
I0804 20:22:15.610882 140157722805632 estimator_training.py:186] Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
I0804 20:22:15.611066 140157722805632 training.py:612] Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 26225 or save_checkpoints_secs None.
I0804 20:22:15.611297 140157722805632 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 26225 or save_checkpoints_secs None.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0804 20:22:15.616826 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:num_readers has been reduced to 9 to match input file shards.
W0804 20:22:15.645005 140157722805632 dataset_builder.py:83] num_readers has been reduced to 9 to match input file shards.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0804 20:22:15.650861 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0804 20:22:15.673810 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/inputs.py:77: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0804 20:22:28.440482 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/inputs.py:77: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0804 20:22:28.560776 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0804 20:22:34.365995 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
INFO:tensorflow:Calling model_fn.
I0804 20:22:39.074822 140157722805632 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:39.111900 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0804 20:22:39.114423 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:42.895321 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:42.913479 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
INFO:tensorflow:depth of additional conv before box predictor: 0
I0804 20:22:42.913854 140157722805632 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/utils/spatial_transform_ops.py:478: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
W0804 20:22:43.784451 140157722805632 deprecation.py:506] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/utils/spatial_transform_ops.py:478: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:43.799534 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/tf_slim/layers/layers.py:1666: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
W0804 20:22:44.139258 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/tf_slim/layers/layers.py:1666: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:44.142145 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
I0804 20:22:44.162720 140157722805632 regularizers.py:99] Scale of 0 disables regularizer.
WARNING:tensorflow:From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/core/losses.py:347: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See `tf.nn.softmax_cross_entropy_with_logits_v2`.
W0804 20:22:44.523350 140157722805632 deprecation.py:323] From /opt/conda/envs/tf/lib/python3.7/site-packages/object_detection/core/losses.py:347: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See `tf.nn.softmax_cross_entropy_with_logits_v2`.
/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:tensorflow:Done calling model_fn.
I0804 20:22:51.249653 140157722805632 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0804 20:22:51.251239 140157722805632 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0804 20:22:54.445739 140157722805632 monitored_session.py:240] Graph was finalized.
2020-08-04 20:22:54.446140: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-08-04 20:22:54.453311: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2020-08-04 20:22:54.453836: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x558834574de0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-04 20:22:54.453870: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-04 20:22:54.455126: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-08-04 20:22:56.446505: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.447287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2020-08-04 20:22:56.447611: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-08-04 20:22:56.448852: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-08-04 20:22:56.450213: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-08-04 20:22:56.450535: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-08-04 20:22:56.452081: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-08-04 20:22:56.453235: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-08-04 20:22:56.456886: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-04 20:22:56.457027: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.457926: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.458622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-08-04 20:22:56.458680: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-08-04 20:22:56.505033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-04 20:22:56.505072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-08-04 20:22:56.505096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-08-04 20:22:56.505329: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.506176: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.507020: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 20:22:56.507852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2020-08-04 20:22:56.510149: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55883531af30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-04 20:22:56.510177: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla K80, Compute Capability 3.7
INFO:tensorflow:Running local_init_op.
I0804 20:23:01.691994 140157722805632 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0804 20:23:01.994730 140157722805632 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into tf-results/model.ckpt.
I0804 20:23:10.123725 140157722805632 basic_session_run_hooks.py:606] Saving checkpoints for 0 into tf-results/model.ckpt.
2020-08-04 20:23:18.682539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
Traceback (most recent call last):
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: {{function_node Dataset_map_transform_and_pad_input_data_fn_422}} Indices are not valid (out of bounds). Shape: [1]
[[{{node cond/SparseToDense}}]]
[[IteratorGetNext]]
[[BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/NonMaxSuppressionV5/_5611]]
(1) Invalid argument: {{function_node Dataset_map_transform_and_pad_input_data_fn_422}} Indices are not valid (out of bounds). Shape: [1]
[[{{node cond/SparseToDense}}]]
[[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "lib/models/research/object_detection/model_main.py", line 111, in <module>
tf.app.run()
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "lib/models/research/object_detection/model_main.py", line 107, in main
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
return executor.run()
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
return self.run_local()
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
saving_listeners=saving_listeners)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
saving_listeners)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1494, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
run_metadata=run_metadata)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
raise six.reraise(*original_exc_info)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
return self._sess.run(*args, **kwargs)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
run_metadata=run_metadata)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
return self._sess.run(*args, **kwargs)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Indices are not valid (out of bounds). Shape: [1]
[[{{node cond/SparseToDense}}]]
[[IteratorGetNext]]
[[BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/NonMaxSuppressionV5/_5611]]
(1) Invalid argument: Indices are not valid (out of bounds). Shape: [1]
[[{{node cond/SparseToDense}}]]
[[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/tree/master/research/...
2. Describe the bug
I am using Object Detection for training single class FRCNN Resnet 50 it. It throws very wierd error.
Config File
Error
3. Steps to reproduce
https://colab.research.google.com/drive/1o7JB0pWanEMn6qnRnEXphu0T4YbuKLL2#scrollTo=PFR1cVakmG-4 Use this colab add
'frcnn_resnet50':{
'pipeline_file':'faster_rcnn_resnet50_coco.config',
'model_name':'faster_rcnn_resnet50_coco_2018_01_28',
'batch_size': 8
}
in 1st cell in Model_config4. Expected behavior
It should start training
5. Additional context
It works fine for SSD problem arises while using FRCNN - Resnet
6. System information
colab with K80 Tesla