Open JingyunLiang opened 6 years ago
I got the same problem with 4 GPUs. I have not found a solution
I don't have such a question. Can you tell me specifically how you set up multi GPU?
@liangbo-1 I have a machine with 4 GPUs. I just change the config parameter CPU_COUNT = 4
I am having the same problem on Ubuntu 16.04, Python 3.6, Tensorflow 1.4 and Keras 2.0.8
I have the same problem on Ubuntu 16.04, Python 3.5, TF 1.4 and keras 2.1.2. Exactly the same error.
@MichaelLiang12 @taijizhao @Nicolai-Haeni Have anyone solved it ? Maybe because of the TF version?
@MichaelLiang12 I upgrade to TF 1.8 and Keras 2.1.6 and the problem disappeared, at least using multiple GPUs on the shapes sample works fine.
I also have this Question. tf=1.3 keras=2.0.8 cuda=8.0 pyhon=3.4.0
how to slove it ??
@zhjpqq for me I upgrade my TF and Keras to the newest version and everything works fine now.
I had the same problem. I fixed it replacing line 97 of parallel_model.py
if K.int_shape(outputs[0]) == ():
by
if K.int_shape(outputs[0]) == () or not K.int_shape(outputs[0]):
I use python=3.6.8, tensorflow-gpu=1.3.0, and keras=2.08.
I could not update tensorflow because I am using a cluster with Nvidia driver version 375.26 and tensorflow>1.4 is not compatible with it. And I do not have root access to change the driver.
I hope this is useful.
With Ubuntu14.04, Python3.6.0, Tensorflow1.4.0, Keras2.0.8, using multi GPU output error as:
More detailed log is as follows (click me)
``` mrcnn_mask_deconv (TimeDistributed) mrcnn_class_logits (TimeDistributed) mrcnn_mask (TimeDistributed) /home/ljy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/ljy/anaconda3/envs/python36/lib/python3.6/site-packages/keras/engine/training.py:1987: UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class. UserWarning('Using a generator with `use_multiprocessing=True`' Epoch 1/30 2018-05-16 09:22:38.466424: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] 2018-05-16 09:22:38.466601: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] 2018-05-16 09:22:38.466769: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] 2018-05-16 09:22:38.466866: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] 2018-05-16 09:22:38.477960: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] 2018-05-16 09:22:38.554091: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] 2018-05-16 09:22:38.594089: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] 2018-05-16 09:22:38.596734: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] 2018-05-16 09:22:38.596861: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] 2018-05-16 09:22:38.596941: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] Traceback (most recent call last): File "/home/ljy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/home/ljy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/home/ljy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: mrcnn_bbox_loss_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower_0/mask_rcnn/mrcnn_bbox_loss/Mean/_4921, tower_1/mask_rcnn/mrcnn_bbox_loss/Mean/_4923, split_2/split_dim)]] [[Node: training/SGD/gradients/tower_1/mask_rcnn/fpn_c4p4/BiasAdd_grad/BiasAddGrad/_5189 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_16970_training/SGD/gradients/tower_1/mask_rcnn/fpn_c4p4/BiasAdd_grad/BiasAddGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ljy/Mask_RCNN/samples/balloon/balloon_nm.py", line 381, in