tensorflow / models

Models and examples built with TensorFlow
Other
76.97k stars 45.79k forks source link

Error using SSD-mobilenet-v1-coco: Incompatible shapes: [3,1917] vs. [16,1] #5391

Closed ZhenyF closed 4 years ago

ZhenyF commented 5 years ago

System information

Describe the problem

I tried train a facial detector using object detection API. The dataset is WIDER Face dataset and the conversion code (to tfrecord) is from here. The pretrained model is ssd_mobilenet_v1_coco and the config file is ssd_mobilenet_v1_focal_loss_pets. I didn't do any modification except setting the number of class to 1, imcreasing the batch size from 25 to 32 and changing the address of label map, checkpoint, and tfrecords. The initialisation process looks fine but I got an error after that:

`I0927 11:59:49.965208 7440 tf_logging.py:115] Restoring parameters from C:/Users/mhb14150/Documents/TF_object_api/My_model/check_point/ssd_mobilenet_v1_coco/model.ckpt INFO:tensorflow:Running local_init_op. I0927 11:59:50.130070 7440 tf_logging.py:115] Running local_init_op. INFO:tensorflow:Done running local_init_op. I0927 11:59:50.387650 7440 tf_logging.py:115] Done running local_init_op. INFO:tensorflow:Starting Session. I0927 11:59:57.981596 7440 tf_logging.py:115] Starting Session. INFO:tensorflow:Saving checkpoint to path C:/Users/mhb14150/Documents/TF_object_api/My_model/mode_trained\model.ckpt I0927 11:59:58.153400 12356 tf_logging.py:115] Saving checkpoint to path C:/Users/mhb14150/Documents/TF_object_api/My_model/mode_trained\model.ckpt INFO:tensorflow:Starting Queues. I0927 11:59:58.169021 7440 tf_logging.py:115] Starting Queues. INFO:tensorflow:global_step/sec: 0 I0927 12:00:09.622732 8096 tf_logging.py:159] global_step/sec: 0 INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Incompatible shapes: [3,1917] vs. [16,1] [[Node: Loss/Match_25/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match_25/cond/one_hot/_4475, Loss/Match_25/cond/Cast_2)]] [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_2_depthwise/BatchNorm/AssignMovingAvg_1/mul/_3975 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge5635...gAvg_1/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Loss/Match_25/cond/mul_4', defined at: File "train.py", line 184, in tf.app.run() File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\util\deprecation.py", line 250, in new_func return func(*args, kwargs) File "train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\legacy\trainer.py", line 290, in train clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue]) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\slim\deployment\model_deploy.py", line 193, in create_clones outputs = model_fn(*args, *kwargs) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\legacy\trainer.py", line 205, in _create_losses losses_dict = detection_model.loss(prediction_dict, true_image_shapes) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 680, in loss keypoints, weights) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 853, in _assign_targets groundtruth_weights_list) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\core\target_assigner.py", line 483, in batch_assign_targets anchors, gt_boxes, gt_class_targets, unmatched_class_label, gt_weights) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\core\target_assigner.py", line 182, in assign valid_rows=tf.greater(groundtruth_weights, 0)) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\core\matcher.py", line 241, in match return Match(self._match(similarity_matrix, valid_rows), File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\matchers\argmax_matcher.py", line 194, in _match _match_when_rows_are_non_empty, _match_when_rows_are_empty) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func return func(args, kwargs) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2040, in cond orig_res_t, res_t = context_t.BuildCondBranch(true_fn) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1890, in BuildCondBranch original_result = fn() File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\matchers\argmax_matcher.py", line 175, in _match_when_rows_are_non_empty tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32)) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\math_ops.py", line 847, in binary_op_wrapper return func(x, y, name=name) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1091, in _mul_dispatch return gen_math_ops.mul(x, y, name=name) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5066, in mul "Mul", x=x, y=y, name=name) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op op_def=op_def) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\framework\ops.py", line 1740, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [3,1917] vs. [16,1] [[Node: Loss/Match_25/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match_25/cond/one_hot/_4475, Loss/Match_25/cond/Cast_2)]] [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_2_depthwise/BatchNorm/AssignMovingAvg_1/mul/_3975 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge5635...gAvg_1/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

I0927 12:00:12.403529 7440 tf_logging.py:115] Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Incompatible shapes: [3,1917] vs. [16,1] [[Node: Loss/Match_25/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match_25/cond/one_hot/_4475, Loss/Match_25/cond/Cast_2)]] [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_2_depthwise/BatchNorm/AssignMovingAvg_1/mul/_3975 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge5635...gAvg_1/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Loss/Match_25/cond/mul_4', defined at: File "train.py", line 184, in tf.app.run() File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\util\deprecation.py", line 250, in new_func return func(*args, kwargs) File "train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\legacy\trainer.py", line 290, in train clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue]) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\slim\deployment\model_deploy.py", line 193, in create_clones outputs = model_fn(*args, *kwargs) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\legacy\trainer.py", line 205, in _create_losses losses_dict = detection_model.loss(prediction_dict, true_image_shapes) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 680, in loss keypoints, weights) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 853, in _assign_targets groundtruth_weights_list) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\core\target_assigner.py", line 483, in batch_assign_targets anchors, gt_boxes, gt_class_targets, unmatched_class_label, gt_weights) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\core\target_assigner.py", line 182, in assign valid_rows=tf.greater(groundtruth_weights, 0)) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\core\matcher.py", line 241, in match return Match(self._match(similarity_matrix, valid_rows), File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\matchers\argmax_matcher.py", line 194, in _match _match_when_rows_are_non_empty, _match_when_rows_are_empty) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func return func(args, kwargs) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2040, in cond orig_res_t, res_t = context_t.BuildCondBranch(true_fn) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1890, in BuildCondBranch original_result = fn() File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\matchers\argmax_matcher.py", line 175, in _match_when_rows_are_non_empty tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32)) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\math_ops.py", line 847, in binary_op_wrapper return func(x, y, name=name) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1091, in _mul_dispatch return gen_math_ops.mul(x, y, name=name) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5066, in mul "Mul", x=x, y=y, name=name) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op op_def=op_def) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\framework\ops.py", line 1740, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [3,1917] vs. [16,1] [[Node: Loss/Match_25/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match_25/cond/one_hot/_4475, Loss/Match_25/cond/Cast_2)]] [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_2_depthwise/BatchNorm/AssignMovingAvg_1/mul/_3975 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge5635...gAvg_1/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Traceback (most recent call last): File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call return fn(*args) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [3,1917] vs. [16,1] [[Node: Loss/Match_25/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match_25/cond/one_hot/_4475, Loss/Match_25/cond/Cast_2)]] [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_2_depthwise/BatchNorm/AssignMovingAvg_1/mul/_3975 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge5635...gAvg_1/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 184, in tf.app.run() File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\util\deprecation.py", line 250, in new_func return func(*args, **kwargs) File "train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\legacy\trainer.py", line 415, in train saver=saver) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 770, in train sess, train_op, global_step, train_step_kwargs) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 487, in train_step run_metadata=run_metadata) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\client\session.py", line 900, in run run_metadata_ptr) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run run_metadata) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [3,1917] vs. [16,1] [[Node: Loss/Match_25/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match_25/cond/one_hot/_4475, Loss/Match_25/cond/Cast_2)]] [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_2_depthwise/BatchNorm/AssignMovingAvg_1/mul/_3975 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge5635...gAvg_1/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Loss/Match_25/cond/mul_4', defined at: File "train.py", line 184, in tf.app.run() File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\util\deprecation.py", line 250, in new_func return func(*args, kwargs) File "train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\legacy\trainer.py", line 290, in train clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue]) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\slim\deployment\model_deploy.py", line 193, in create_clones outputs = model_fn(*args, *kwargs) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\legacy\trainer.py", line 205, in _create_losses losses_dict = detection_model.loss(prediction_dict, true_image_shapes) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 680, in loss keypoints, weights) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 853, in _assign_targets groundtruth_weights_list) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\core\target_assigner.py", line 483, in batch_assign_targets anchors, gt_boxes, gt_class_targets, unmatched_class_label, gt_weights) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\core\target_assigner.py", line 182, in assign valid_rows=tf.greater(groundtruth_weights, 0)) File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\core\matcher.py", line 241, in match return Match(self._match(similarity_matrix, valid_rows), File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\matchers\argmax_matcher.py", line 194, in _match _match_when_rows_are_non_empty, _match_when_rows_are_empty) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func return func(args, kwargs) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2040, in cond orig_res_t, res_t = context_t.BuildCondBranch(true_fn) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1890, in BuildCondBranch original_result = fn() File "C:\Users\mhb14150\Documents\TF_object_api\models\research\object_detection\matchers\argmax_matcher.py", line 175, in _match_when_rows_are_non_empty tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32)) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\math_ops.py", line 847, in binary_op_wrapper return func(x, y, name=name) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1091, in _mul_dispatch return gen_math_ops.mul(x, y, name=name) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5066, in mul "Mul", x=x, y=y, name=name) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op op_def=op_def) File "C:\Users\mhb14150\Desktop\winpython\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\tensorflow\python\framework\ops.py", line 1740, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [3,1917] vs. [16,1] [[Node: Loss/Match_25/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match_25/cond/one_hot/_4475, Loss/Match_25/cond/Cast_2)]] [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_2_depthwise/BatchNorm/AssignMovingAvg_1/mul/_3975 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge5635...gAvg_1/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]`

Can someone help me with this? Thanks!

Aitul commented 5 years ago

I am havving the same issue today

javad314 commented 5 years ago

Unfortunately, I am having the same error and despite all my efforts, I have not find the reason. Like you, I have one class, and I am using ssd_mobilenet_v1_pets model. The interesting part is, with every run, I am getting this error with different shapes! like [2,1917] vs. [4,1] and [3,1917] vs. [5,1] and so on!

hi-ilkin commented 5 years ago

@javad314 I also had same problem, with changing bounding box valuas from float to int had solved my problem. I'm using open image dataset

UPDATE: For us this only worked for faster_rcnn_resnet50. @ZhenyF The bounding box info that we downloaded from OID was not normilized.

ZhenyF commented 5 years ago

@huseynlilkin How do you assign it to int? The coordinates anchor are normalised to [0,1]

linlin860320 commented 5 years ago

I have the same problem with @javad314 when using ssd_mobilenet_v1_pets.config. After run 29 steps the problem occurs. InvalidArgumentError :Incompatible shapes: [2,1917] vs. [4,1]

and when I use ssd_mobilenet_v2_coco_configs has the similar problem with Incompatible shapes: [6,1917] vs. [7,1]

When using ssd models to train have the problem with shapes, but I don't know how to fix it.

faizan1041 commented 5 years ago

Anyone found solution? Here's my question: https://stackoverflow.com/questions/52623733/tensorflow-incompatible-shapes-error-while-training?noredirect=1#comment92189808_52623733 I was able to do this successfully with 1 class, now I have added 2 classes and a few more images to the dataset, I have regenerated xml to csv and tf records by deleting old ones, pbtxt file is correct and also deleted old checkpoints inside the training directory but when I run the training job like this:

python legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config

following this tutorial . I get the following error:

InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [4,1] [[Node: Loss/Match/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]

michaelcamba commented 5 years ago

I am encountering the same issue. after several steps I encounter this error:

InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [3,1] [[Node: Loss/Match/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]

Can someone help me with this? Thanks in advance

faizan1041 commented 5 years ago

I solved this by changing model to faster_rcnn_inception_v2. However with ssd_mobile_net_v1_coco and ssd_mobile_net_v2_coco I'm not able to do that.

w5688414 commented 5 years ago

It works fine in faster rcnn, but doestn't work in ssd mobile net? can anyone solve this problem?

grofattila commented 5 years ago

Same here:

SSD with Mobilenet v1

INFO:tensorflow:Error reported to Coordinator: Incompatible shapes: [2,1917] vs. [3,1] [[Node: Loss/Match_14/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/Match_14/cond/one_hot, Loss/Match_14/cond/Cast_2)]]

hi-ilkin commented 5 years ago

@ZhenyF can you solve this problem for SSD ?

hi-ilkin commented 5 years ago

I used model_main.py instead of legacy/train.py and problem solved. I can successfully trained SSD models. Running model_main.py:

python3 model_main.py \ --pipeline_config_path=PRETRAINED_CONFIG \ --model_dir=TRAINING_OUTPUT \ --num_train_steps=TRAINING_STEP_COUNT \ --alsologtostderr \ --sample_1_of_n_eval_examples=1

If nothing printed on console add _tf.logging.setverbosity(tf.logging.INFO) after imports in model_main.py .

w5688414 commented 5 years ago

@huseynlilkin my command is python3 model_main.py --pipeline_config_path=ssd_mobilenet_v1_coco.config --model_dir=training/ssd_mobilenet_v1_coco --num_train_steps=100000 --alsologtostderr --sample_1_of_n_eval_examples=1 but got another error `084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9841 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1) Traceback (most recent call last): File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1 values, but the requested shape requires a multiple of 13 [[Node: Reshape_11 = Reshape[T=DT_BOOL, Tshape=DT_INT32, _device="/device:CPU:0"](Cast, Reshape_13/shape)]] [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[8], [8,300,300,3], [8,2], [8,3], [8,100], [8,100,4], [8,100,90], [8,100,90], [8,100], [8,100], [8,100], [8]], output_types=[DT_INT32, DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "model_main.py", line 109, in tf.app.run() File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "model_main.py", line 105, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 447, in train_and_evaluate return executor.run() File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 531, in run return self.run_local() File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 669, in run_local hooks=train_hooks) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 366, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1119, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1135, in _train_model_default saving_listeners) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1336, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 577, in run run_metadata=run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1053, in run run_metadata=run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1144, in run raise six.reraise(original_exc_info) File "/home/eric/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1129, in run return self._sess.run(args, *kwargs) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1201, in run run_metadata=run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 981, in run return self._sess.run(args, **kwargs) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1 values, but the requested shape requires a multiple of 13 [[Node: Reshape_11 = Reshape[T=DT_BOOL, Tshape=DT_INT32, _device="/device:CPU:0"](Cast, Reshape_13/shape)]] [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[8], [8,300,300,3], [8,2], [8,3], [8,100], [8,100,4], [8,100,90], [8,100,90], [8,100], [8,100], [8,100], [8]], output_types=[DT_INT32, DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] ` so how about your tensorflow versions and other configurations

hi-ilkin commented 5 years ago

@w5688414 here they are:

Also I pulled the newest version of tensorflow and models repos. I'm not using anaconda.

tharhtetsan commented 5 years ago

I facing the same issue today using ssd_mobilenet_v1_coco_11_06_2017 MacAir - i5 RAM 8GM

InvalidArgumentError (see above for traceback): Incompatible shapes: [4,1917] vs. [15,1] [[{{node Loss/Match_13/cond/mul_4}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/Match_13/cond/one_hot, Loss/Match_13/cond/Cast_2)]] how can i solve that ?

smitshilu commented 5 years ago

Tensorflow 1.1 has model_main.py for training use that.

Shadowkm commented 5 years ago

I used model_main.py instead of legacy/train.py and problem solved. I can successfully trained SSD models. Running model_main.py:

python3 model_main.py --pipeline_config_path=PRETRAINED_CONFIG --model_dir=TRAINING_OUTPUT --num_train_steps=TRAINING_STEP_COUNT --alsologtostderr --sample_1_of_n_eval_examples=1

If nothing printed on console add _tf.logging.setverbosity(tf.logging.INFO) after imports in model_main.py .

Still got a error :

OutOfRangeError (see above for traceback): End of sequence [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[1], [1,300,30 0,3], [1,300,300,3], [1,2], [1,3], [1,100], [1,100,4], [1,100,90], [1,100,90], [ 1,100], [1,100], [1,100], [1]], output_types=[DT_INT32, DT_FLOAT, DT_UINT8, DT_I NT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FL OAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1322, in _do_call return fn(*args) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: can't p ickle dict_values objects Traceback (most recent call last):

File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", lin e 158, in call ret = func(*args)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 346, in first_value_func self._metrics = self.evaluate()

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 207, in evaluate self._detection_boxes_list)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_tools.py", line 118, in LoadAnnotations results.dataset['categories'] = copy.deepcopy(self.dataset['categories'])

File "D:\Anaconda3\lib\copy.py", line 169, in deepcopy rv = reductor(4)

TypeError: can't pickle dict_values objects

     [[Node: PyFunc_3 = PyFunc[Tin=[], Tout=[DT_FLOAT], token="pyfunc_5", _d

evice="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "model_main.py", line 109, in tf.app.run() File "D:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "model_main.py", line 105, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 447, in train_and_evaluate return executor.run() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 531, in run return self.run_local() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 681, in run_local eval_result, export_results = evaluator.evaluate_and_export() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 886, in evaluate_and_export hooks=self._eval_spec.hooks) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py" , line 460, in evaluate output_dir=self.eval_dir(name)) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py" , line 1386, in _evaluate_run config=self._session_config) File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\evaluation.py" , line 212, in _evaluate_once session.run(eval_ops, feed_dict) File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_sess ion.py", line 689, in exit self._close_internal(exception_type) File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_sess ion.py", line 721, in _close_internal h.end(self._coordinated_creator.tf_sess) File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\basicsession run_hooks.py", line 824, in end self._final_ops, feed_dict=self._final_ops_feed_dict) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 900, in run run_metadata_ptr) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1135, in _run feed_dict_tensor, options, run_metadata) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1316, in _do_run run_metadata) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: can't p ickle dict_values objects Traceback (most recent call last):

File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", lin e 158, in call ret = func(*args)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 346, in first_value_func self._metrics = self.evaluate()

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 207, in evaluate self._detection_boxes_list)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_tools.py", line 118, in LoadAnnotations results.dataset['categories'] = copy.deepcopy(self.dataset['categories'])

File "D:\Anaconda3\lib\copy.py", line 169, in deepcopy rv = reductor(4)

TypeError: can't pickle dict_values objects

     [[Node: PyFunc_3 = PyFunc[Tin=[], Tout=[DT_FLOAT], token="pyfunc_5", _d

evice="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'PyFunc_3', defined at: File "model_main.py", line 109, in tf.app.run() File "D:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "model_main.py", line 105, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 447, in train_and_evaluate return executor.run() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 531, in run return self.run_local() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 681, in run_local eval_result, export_results = evaluator.evaluate_and_export() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 886, in evaluate_and_export hooks=self._eval_spec.hooks) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py" , line 453, in evaluate input_fn, hooks, checkpoint_path) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py" , line 1348, in _evaluate_build_graph features, labels, model_fn_lib.ModeKeys.EVAL, self.config) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py" , line 1107, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "C:\tensorflow4\models\research\object_detection\model_lib.py", line 414, in model_fn eval_config, category_index.values(), eval_dict) File "C:\tensorflow4\models\research\object_detection\eval_util.py", line 681, in get_eval_metric_ops_for_evaluators eval_dict)) File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 356, in get_estimator_eval_metric_ops first_value_op = tf.py_func(first_value_func, [], tf.float32) File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", lin e 384, in py_func func=func, inp=inp, Tout=Tout, stateful=stateful, eager=False, name=name) File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", lin e 227, in _internal_py_func input=inp, token=token, Tout=Tout, name=name) File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_script_ops.py", line 130, in py_func "PyFunc", input=input, token=token, Tout=Tout, name=name) File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_librar y.py", line 787, in _apply_op_helper op_def=op_def) File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op op_def=op_def) File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1740, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected- access

InvalidArgumentError (see above for traceback): TypeError: can't pickle dict_val ues objects Traceback (most recent call last):

File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", lin e 158, in call ret = func(*args)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 346, in first_value_func self._metrics = self.evaluate()

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 207, in evaluate self._detection_boxes_list)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_tools.py", line 118, in LoadAnnotations results.dataset['categories'] = copy.deepcopy(self.dataset['categories'])

File "D:\Anaconda3\lib\copy.py", line 169, in deepcopy rv = reductor(4)

TypeError: can't pickle dict_values objects

     [[Node: PyFunc_3 = PyFunc[Tin=[], Tout=[DT_FLOAT], token="pyfunc_5", _d

evice="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Still meet this issue ,do you know why happening to this ?

roshankumarbhuyan commented 5 years ago

Got this error on SSD inception:

incompatible shapes: [20,1917] vs. [27,1] [[{{node loss/match_5/cond/mul_4}} = mul[t=dt_float, _device="/job:localhost/replica:0/task:0/device:gpu:0"](loss/match_5/cond/one_hot, loss/match_5/cond/cast_2)]] [[{{node loss/toint32_2/_2137}} = _recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:cpu:0", send_device="/job:localhost/replica:0/task:0/device:gpu:0", send_device_incarnation=1, tensor_name="edge_9800_loss/toint32_2", tensor_type=dt_int32, _device="/job:localhost/replica:0/task:0/device:cpu:0"]()]]

GalMoore commented 5 years ago

If you are training Mobilenet SSD following the Obstacle Detection API tutorial you could try commenting out the "ssd_random_crop" augmentation in your config file. This worked for me. Was initially pointed out by rky0930 here.

SentimentSongs commented 5 years ago

I also appeared.

InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [3,1]
         [[Node: Loss/Match/cond/mul_4 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot/_1721, Loss/Match/cond/Cast_2)]]
         [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_6_depthwise/Relu6/_2269 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4148_FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_6_depthwise/Relu6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

I have tried to annotate "ssd_random_crop" and train it.

object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path=$CONFIG_FILE \
--trained_checkpoint_prefix=$CHECKPOINT_PATH \
--output_directory=$OUTPUT_DIR \
--add_postprocessing_op=true
bazel run --config=opt tensorflow/contrib/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,300,300,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
--inference_type=QUANTIZED_UINT8 \
--mean_values=128 \
--std_values=128 \
--change_concat_input_ranges=false \
--allow_custom_ops

Pb, tflite is no problem.

Tflite is unable to run on tensorflow Android demo.

xjtuljy commented 5 years ago

I also met the same problem when using legacy/train.py to train SSD_mobilenet_v2 after I checkout the latest version. It should work for legacy/train.py with "ssd_random_crop" on in the previous version (in Sept). With the latest version I can confirm that "ssd_random_crop" option works with model_main.py; and ""ssd_random_crop" should be turned off when using train.py.

However, since multi-gpu training can ony be enabled by train.py #4774, in a lot of cases I perfer to use train.py. Hopefully the model_main.py can be updated to support multiple gpu training.

anonym24 commented 5 years ago

@huseynlilkin but model_train.py works quite ugly (eats a lot of CPU): https://github.com/tensorflow/models/issues/5719

anonym24 commented 5 years ago

@huseynlilkin

If nothing printed on console add tf.logging.set_verbosity(tf.logging.INFO) after imports in model_main.py .

it show only 0, 100, ... steps

train.py was showing every steps 0, 1, 2, ...

anonym24 commented 5 years ago

also guys it's better to set lower value for batch_size when using SSD MobileNet model and pets config I set it to 1 (default was 24) when it was 24 training was very slow and it was eating a lot of CPU (90%) and I have GTX 1060 6GB more about this issue https://github.com/tensorflow/models/issues/5719

MonteChristo46 commented 5 years ago

I was able to avoid the error by deleting the data augmentation options for the random ssd crop - i deleted following lines in the pipline.config:

data_augmentation_options { ssd_random_crop { } } The error occured by using the SSD_mobilenet_v1 and SSD_mobilenet_V2.

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

Chromer163 commented 4 years ago

@huseynlilkin my command is python3 model_main.py --pipeline_config_path=ssd_mobilenet_v1_coco.config --model_dir=training/ssd_mobilenet_v1_coco --num_train_steps=100000 --alsologtostderr --sample_1_of_n_eval_examples=1 but got another error `084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9841 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1) Traceback (most recent call last): File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1 values, but the requested shape requires a multiple of 13 [[Node: Reshape_11 = Reshape[T=DT_BOOL, Tshape=DT_INT32, _device="/device:CPU:0"](Cast, Reshape_13/shape)]] [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[8], [8,300,300,3], [8,2], [8,3], [8,100], [8,100,4], [8,100,90], [8,100,90], [8,100], [8,100], [8,100], [8]], output_types=[DT_INT32, DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "model_main.py", line 109, in tf.app.run() File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "model_main.py", line 105, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 447, in train_and_evaluate return executor.run() File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 531, in run return self.run_local() File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 669, in run_local hooks=train_hooks) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 366, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1119, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1135, in _train_model_default saving_listeners) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1336, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 577, in run run_metadata=run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1053, in run run_metadata=run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1144, in run raise six.reraise(original_exc_info) File "/home/eric/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1129, in run return self._sess.run(args, *kwargs) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1201, in run run_metadata=run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 981, in run return self._sess.run(args, **kwargs) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/eric/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1 values, but the requested shape requires a multiple of 13 [[Node: Reshape_11 = Reshape[T=DT_BOOL, Tshape=DT_INT32, _device="/device:CPU:0"](Cast, Reshape_13/shape)]] [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[8], [8,300,300,3], [8,2], [8,3], [8,100], [8,100,4], [8,100,90], [8,100,90], [8,100], [8,100], [8,100], [8]], output_types=[DT_INT32, DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] ` so how about your tensorflow versions and other configurations

Have you solved this problem? I met the same issue.

Chromer163 commented 4 years ago

I used model_main.py instead of legacy/train.py and problem solved. I can successfully trained SSD models. Running model_main.py: python3 model_main.py --pipeline_config_path=PRETRAINED_CONFIG --model_dir=TRAINING_OUTPUT --num_train_steps=TRAINING_STEP_COUNT --alsologtostderr --sample_1_of_n_eval_examples=1 If nothing printed on console add _tf.logging.setverbosity(tf.logging.INFO) after imports in model_main.py .

Still got a error :

OutOfRangeError (see above for traceback): End of sequence [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[1], [1,300,30 0,3], [1,300,300,3], [1,2], [1,3], [1,100], [1,100,4], [1,100,90], [1,100,90], [ 1,100], [1,100], [1,100], [1]], output_types=[DT_INT32, DT_FLOAT, DT_UINT8, DT_I NT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FL OAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1322, in _do_call return fn(*args) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: can't p ickle dict_values objects Traceback (most recent call last):

File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", lin e 158, in call ret = func(*args)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 346, in first_value_func self._metrics = self.evaluate()

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 207, in evaluate self._detection_boxes_list)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_tools.py", line 118, in LoadAnnotations results.dataset['categories'] = copy.deepcopy(self.dataset['categories'])

File "D:\Anaconda3\lib\copy.py", line 169, in deepcopy rv = reductor(4)

TypeError: can't pickle dict_values objects

     [[Node: PyFunc_3 = PyFunc[Tin=[], Tout=[DT_FLOAT], token="pyfunc_5", _d

evice="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "model_main.py", line 109, in tf.app.run() File "D:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "model_main.py", line 105, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 447, in train_and_evaluate return executor.run() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 531, in run return self.run_local() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 681, in run_local eval_result, export_results = evaluator.evaluate_and_export() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 886, in evaluate_and_export hooks=self._eval_spec.hooks) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py" , line 460, in evaluate output_dir=self.eval_dir(name)) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py" , line 1386, in _evaluate_run config=self._session_config) File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\evaluation.py" , line 212, in _evaluate_once session.run(eval_ops, feed_dict) File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_sess ion.py", line 689, in exit self._close_internal(exception_type) File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_sess ion.py", line 721, in _close_internal h.end(self._coordinated_creator.tf_sess) File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\basicsession run_hooks.py", line 824, in end self._final_ops, feed_dict=self._final_ops_feed_dict) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 900, in run run_metadata_ptr) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1135, in _run feed_dict_tensor, options, run_metadata) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1316, in _do_run run_metadata) File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", lin e 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: can't p ickle dict_values objects Traceback (most recent call last):

File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", lin e 158, in call ret = func(*args)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 346, in first_value_func self._metrics = self.evaluate()

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 207, in evaluate self._detection_boxes_list)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_tools.py", line 118, in LoadAnnotations results.dataset['categories'] = copy.deepcopy(self.dataset['categories'])

File "D:\Anaconda3\lib\copy.py", line 169, in deepcopy rv = reductor(4)

TypeError: can't pickle dict_values objects

     [[Node: PyFunc_3 = PyFunc[Tin=[], Tout=[DT_FLOAT], token="pyfunc_5", _d

evice="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'PyFunc_3', defined at: File "model_main.py", line 109, in tf.app.run() File "D:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "model_main.py", line 105, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0]) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 447, in train_and_evaluate return executor.run() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 531, in run return self.run_local() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 681, in run_local eval_result, export_results = evaluator.evaluate_and_export() File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\training.py", line 886, in evaluate_and_export hooks=self._eval_spec.hooks) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py" , line 453, in evaluate input_fn, hooks, checkpoint_path) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py" , line 1348, in _evaluate_build_graph features, labels, model_fn_lib.ModeKeys.EVAL, self.config) File "D:\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py" , line 1107, in _call_model_fn model_fn_results = self._model_fn(features=features, kwargs) File "C:\tensorflow4\models\research\object_detection\model_lib.py", line 414, in model_fn eval_config, category_index.values(), eval_dict) File "C:\tensorflow4\models\research\object_detection\eval_util.py", line 681, in get_eval_metric_ops_for_evaluators eval_dict)) File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 356, in get_estimator_eval_metric_ops first_value_op = tf.py_func(first_value_func, [], tf.float32) File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", lin e 384, in py_func func=func, inp=inp, Tout=Tout, stateful=stateful, eager=False, name=name) File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", lin e 227, in _internal_py_func input=inp, token=token, Tout=Tout, name=name) File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_script_ops.py", line 130, in py_func "PyFunc", input=input, token=token, Tout=Tout, name=name) File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_librar y.py", line 787, in _apply_op_helper op_def=op_def) File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op op_def=op_def) File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1740, in init** self._traceback = self._graph._extract_stack() # pylint: disable=protected- access

InvalidArgumentError (see above for traceback): TypeError: can't pickle dict_val ues objects Traceback (most recent call last):

File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", lin e 158, in call ret = func(*args)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 346, in first_value_func self._metrics = self.evaluate()

File "C:\tensorflow4\models\research\object_detection\metrics\coco_evaluation. py", line 207, in evaluate self._detection_boxes_list)

File "C:\tensorflow4\models\research\object_detection\metrics\coco_tools.py", line 118, in LoadAnnotations results.dataset['categories'] = copy.deepcopy(self.dataset['categories'])

File "D:\Anaconda3\lib\copy.py", line 169, in deepcopy rv = reductor(4)

TypeError: can't pickle dict_values objects

     [[Node: PyFunc_3 = PyFunc[Tin=[], Tout=[DT_FLOAT], token="pyfunc_5", _d

evice="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Still meet this issue ,do you know why happening to this ?

Have you solved this problem? I met the same issue.