qqwweee / keras-yolo3

A Keras implementation of YOLOv3 (Tensorflow backend)
MIT License
7.13k stars 3.45k forks source link

Error When Training Yolov3 with Image Size 1280x720 #204

Open tristochief opened 6 years ago

tristochief commented 6 years ago

When I train with a dataset with image size: (1280 by 720) and change the batch size to either 1, 10 or 16, I get the following error:

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,256,80,44] vs. shape[1] = [1,512,80,45]

Here is the code where this error is happening (The line is in *bold)

def yolo_body(inputs, num_anchors, num_classes):
    """Create YOLO_V3 model CNN body in Keras."""
    darknet = Model(inputs, darknet_body(inputs))
    x, y1 = make_last_layers(darknet.output, 512, num_anchors*(num_classes+5))

    x = compose(
            DarknetConv2D_BN_Leaky(256, (1,1)),
            UpSampling2D(2))(x)
    print('got here')
    ****x = Concatenate()([x,darknet.layers[152].output])****
    x, y2 = make_last_layers(x, 256, num_anchors*(num_classes+5))

    x = compose(
            DarknetConv2D_BN_Leaky(128, (1,1)),
            UpSampling2D(2))(x)
    x = Concatenate()([x,darknet.layers[92].output])
    x, y3 = make_last_layers(x, 128, num_anchors*(num_classes+5))

    return Model(inputs, [y1,y2,y3])

I am certain that it is to do with the image size, because I have tried with 740 by 416, and it ran several epoches before encountering a completely different error.

here is the full output from the terminal:


/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Using TensorFlow backend.
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
2018-08-03 06:39:11.817243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:65:00.0
totalMemory: 10.91GiB freeMemory: 10.46GiB
2018-08-03 06:39:11.817284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-08-03 06:39:12.008818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-03 06:39:12.008854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-08-03 06:39:12.008859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-08-03 06:39:12.009089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10129 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
got here
Create YOLOv3 model with 9 anchors and 80 classes.
Load weights model_data/yolo.h5.
Freeze the first 249 layers of total 252 layers.
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Train on 2296 samples, val on 255 samples, with batch size 1.
Epoch 1/50
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1327, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1312, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
    status, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,256,80,44] vs. shape[1] = [1,512,80,45]
     [[Node: concatenate_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concatenate_1/concat-0-TransposeNHWCToNCHW-LayoutOptimizer, add_19/add, concatenate_1/concat-2-LayoutOptimizer)]]
     [[Node: yolo_loss/while_2/strided_slice_1/stack_1/_2925 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3886_yolo_loss/while_2/strided_slice_1/stack_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_2/strided_slice_1/stack_2/_2819)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 190, in <module>
    _main()
  File "train.py", line 65, in _main
    callbacks=[logging, checkpoint])
  File "/home/tris/.local/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1415, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/engine/training_generator.py", line 213, in fit_generator
    class_weight=class_weight)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1215, in train_on_batch
    outputs = self.train_function(ins)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2672, in __call__
    return self._legacy_call(inputs)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2654, in _legacy_call
    **self.session_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1140, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,256,80,44] vs. shape[1] = [1,512,80,45]
     [[Node: concatenate_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concatenate_1/concat-0-TransposeNHWCToNCHW-LayoutOptimizer, add_19/add, concatenate_1/concat-2-LayoutOptimizer)]]
     [[Node: yolo_loss/while_2/strided_slice_1/stack_1/_2925 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3886_yolo_loss/while_2/strided_slice_1/stack_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_2/strided_slice_1/stack_2/_2819)]]

Caused by op 'concatenate_1/concat', defined at:
  File "train.py", line 190, in <module>
    _main()
  File "train.py", line 33, in _main
    freeze_body=2, weights_path='model_data/yolo.h5') # make sure you know what you freeze
  File "train.py", line 116, in create_model
    model_body = yolo_body(image_input, num_anchors//3, num_classes)
  File "/home/tris/Documents/beacohealth/HHCM/OD/Software/yolo/keras-yolo3/yolo3/model.py", line 79, in yolo_body
    x = Concatenate()([x,darknet.layers[152].output])
  File "/home/tris/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/layers/merge.py", line 155, in call
    return self._merge_function(inputs)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/layers/merge.py", line 357, in _merge_function
    return K.concatenate(inputs, axis=self.axis)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1934, in concatenate
    return tf.concat([to_dense(x) for x in tensors], axis)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 1181, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 949, in concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,256,80,44] vs. shape[1] = [1,512,80,45]
     [[Node: concatenate_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concatenate_1/concat-0-TransposeNHWCToNCHW-LayoutOptimizer, add_19/add, concatenate_1/concat-2-LayoutOptimizer)]]
     [[Node: yolo_loss/while_2/strided_slice_1/stack_1/_2925 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3886_yolo_loss/while_2/strided_slice_1/stack_1", tensor_type=DT_INT3```
MiVaVo commented 6 years ago

Did you manage to solve this problem ?

l33tl4bs commented 6 years ago

@tristochief 720x1280 image size is not multiple of 32 (see code) - you could add two 8px black bars to get to 736x1280 and it'll work! ;)

tristochief commented 5 years ago

@tristochief 720x1280 image size is not multiple of 32 (see code) - you could add two 8px black bars to get to 736x1280 and it'll work! ;)

thanks!

Pari-singh commented 5 years ago

Hey, my image size is the same. However I solved it using cropping. But I am encountering a separate problem:

Here is the complete error, I got 019-06-27 20:42:22.389956: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_3: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error. 2019-06-27 20:42:22.393907: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_1_4: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error. 2019-06-27 20:42:22.393964: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_2_5: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error. 2019-06-27 20:42:23.419646: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally Traceback (most recent call last): File "train.py", line 521, in _main() File "train.py", line 178, in _main initial_epoch=0 File "/opt/conda/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, *kwargs) File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/opt/conda/lib/python3.7/site-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "/opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call return self._call(inputs) File "/opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(array_vals) File "/root/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in call run_metadata_ptr) File "/root/.local/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: TensorArray replica_0/model_3/yolo_loss/TensorArray_3: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error. [[{{node replica_0/model_3/yolo_loss/TensorArrayStack/TensorArrayGatherV3}}]] [[{{node replica_1/model_3/yolo_loss/add_17}}]]

Any lead to the solution or help is welcome!