Problem when image with size [1920,1080] is used for training but size [416,416] is all right

LyuYifan commented 4 years ago

The description in the terminal is as follows :

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 143, in feed_dict={is_training: True}) File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,60,34,3,1] vs. [1,60,33,3,1] [[Node: logistic_loss/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](split:2, strided_slice_20)]] [[Node: gradients/yolov3/yolov3_head/Conv_5/LeakyRelu/Maximum_grad/Shape_1/_931 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_46781_gradients/yolov3/yolov3_head/Conv_5/LeakyRelu/Maximum_grad/Shape_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'logistic_loss/mul', defined at: File "train.py", line 75, in loss = yolo_model.compute_loss(pred_feature_maps, y_true) File "/home/lyuyifan/YOLOV3_code_online/YOLOv3_TensorFlow-master/model.py", line 359, in compute_loss result = self.loss_layer(y_pred[i], y_true[i], anchor_group[i]) File "/home/lyuyifan/YOLOV3_code_online/YOLOv3_TensorFlow-master/model.py", line 282, in loss_layer conf_loss_pos = conf_pos_mask tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask, logits=pred_conf_logits) File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 181, in sigmoid_cross_entropy_with_logits relu_logits - logits labels, File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 979, in binary_op_wrapper return func(x, y, name=name) File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1211, in _mul_dispatch return gen_math_ops.mul(x, y, name=name) File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4759, in mul "Mul", x=x, y=y, name=name) File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/home/lyuyifan/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

Do you have a inspiration that how this problem occurs?

@wizyoung @matthew-jack

AhmedAnwar97 commented 4 years ago

did you find a solution for this? same happens for me when I set the img_size = [720,390]

LyuYifan commented 4 years ago

did you find a solution for this? same happens for me when I set the img_size = [720,390]

I partly solve this problem. I think this problem occurs due to multi-scale training. As a result, you should set the image size [width, height] as [32m, 32n], where m, n are intergers. For your image size [720, 390], you may set the image size as [736, 384] in the file arg.py. This setting will ensure that the program can begin to train. However, this measure will affect the size of bounding box, at least for my image size.

I am still working on this issue. If you found a perfect solution, please share it with me.

Thanks & Regards

AhmedAnwar97 commented 4 years ago

@LyuYifan thank you for the illustration, that was very helpful.

LyuYifan commented 4 years ago

@LyuYifan thank you for the illustration, that was very helpful.

Welcome. I have already solved the problem I mentioned. The code is right. I put the wrong anchor box. So, just use the code with [32m, 32n], which is right.

lander1003 commented 4 years ago

@LyuYifan thank you for the illustration, that was very helpful.

Welcome. I have already solved the problem I mentioned. The code is right. I put the wrong anchor box. So, just use the code with [32m, 32n], which is right.

So If I want to use the images with size [1920,1080],I should use the code [1920,1088]?

LyuYifan commented 4 years ago

@LyuYifan thank you for the illustration, that was very helpful.

Welcome. I have already solved the problem I mentioned. The code is right. I put the wrong anchor box. So, just use the code with [32m, 32n], which is right.

So If I want to use the images with size [1920,1080],I should use the code [1920,1088]?

I think so. Please try it.

wizyoung / YOLOv3_TensorFlow

Problem when image with size [1920,1080] is used for training but size [416,416] is all right #247