Input shape axis 0 must equal 4, got shape [3]

yonghuixu commented 5 years ago

错误： Epoch: [1003/10] step: [1003/1] G_init time: 0.30186986923217773s, mse: 0.029376816004514694 2019-07-27 20:02:30.701958: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at iterator_ops.cc:988 : Invalid argument: Input shape axis 0 must equal 4, got shape [3] [[{{node crop_to_bounding_box/unstack}}]]

如果数据集很小，代码正常运行，但是我需要的数据集稍大，就会出现上面的错误。下面的说明是在数据集有9999张图片下运行的结果。

今天，通过一步一步的推敲，发现并不是batch_size的问题，而是tf.data.Dataset.from_generator(generator_train, output_types=(tf.float32,tf.float32))的问题，理由如下： 1.下面是我的generator_train()。通过倒推，在我的generator_train()输出imglr.shape，如下： def generator_train(): for imglr,imghr in zip(train_lr_imgs, train_hr_imgs): print(imglr.shape) yield imglr,imghr

Epoch: [0/20] step: [202/2] time: 0.293165922164917s, mse: 0.015779396519064903 (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) Epoch: [0/20] step: [203/2] time: 0.3003373146057129s, mse: 0.01940624974668026 (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178) (218, 178, 3) Epoch: [0/20] step: [204/2] time: 0.31006765365600586s, mse: 0.012748796492815018 (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) Epoch: [0/20] step: [205/2] time: 0.2964756488800049s, mse: 0.01310880295932293 (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) 2019-07-29 20:02:18.539915: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at iterator_ops.cc:988 : Invalid argument: Input shape axis 0 must equal 4, got shape [3] [[{{node crop_to_bounding_box/unstack}}]]

可以发现，在step==204（先输出shape，再输出Epoch: [0/20] step: [204/2]）时，其中有一张图片的shape为（218，178），而不是（218，178，3）。（之所以出现Input shape axis 0 must equal 4, got shape [3]，是因为我在_map_fn_train(imglr,imghr)用了tf.image.crop_to_bounding_box（），换成源码的tf.image.random_crop，报错为：Incompatible shapes: [2] vs. [3]。所以一定是因为这里的shape导致该错误。）所以我对该数据集中的所有图片进行了检查，输出第三个维度，发现全部都是3，所以我的数据集也是没有问题的。然后我就想是不是我的zip导致了图片的shape发生了改变？由于无法直接验证，于是我就用源码的train.py(因为这里没用zip，仅仅改了lr和hr的size)，结果报错如下： Epoch: [0/20] step: [185/2] time: 0.293s, mse: 0.003 (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3)ke (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) Epoch: [0/20] step: [186/2] time: 0.301s, mse: 0.003 (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) Epoch: [0/20] step: [187/2] time: 0.309s, mse: 0.003 (218, 178) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) (218, 178, 3) Epoch: [0/20] step: [188/2] time: 0.306s, mse: 0.005 (218, 178, 3) 2019-07-29 20:23:08.229313: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at iterator_ops.cc:988 : Invalid argument: Incompatible shapes: [2] vs. [3] [[{{node random_crop/GreaterEqual}}]] Traceback (most recent call last): File "new_train.py", line 204, in (218, 178, 3) train() (218, 178, 3) File "new_train.py", line 94, in train for step, (lr_patchs, hr_patchs) in enumerate(train_ds): (218, 178, 3) File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 556, in next return self.next() File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 585, in next return self._next_internal() File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 577, in _next_internal (218, 178, 3) output_shapes=self._flat_output_shapes) File "/home/xyh/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1954, in iterator_get_next_sync _six.raise_from(_core._status_to_exception(e.code, message), None) File "", line 3, in raise_from (218, 178, 3) 可以发现，在step==188时，其中有一张图片的shape为（218，178），而不是（218，178，3）。报错里面也显示：Incompatible shapes: [2] vs. [3]（和上面相同）。因此，也不是zip导致的该错误。综上，只可能是tf.data.Dataset.from_generator()导致的该错误。（猜想可能图片稍多就会压缩一部分图片的shape。）所以想问一下各位大神，有没有代替tf.data.Dataset.from_generator()的方法？

zsdonghao commented 5 years ago

Epoch: [0/20] step: [187/2] time: 0.309s, mse: 0.003
(218, 178)  <== bug here, please check your data
(218, 178, 3)

Solution 1: fix your data
Solution 2: reshape data inside the data augmentation function

yonghuixu commented 5 years ago

我单独写了个文件对数据集进行了测试，发现所有的图片的shape全部是（218，178，3），并没有(218,178)的图片。

---原始邮件--- 发件人: "Hao"notifications@github.com 发送时间: 2019年7月29日(星期一) 晚上11:11 收件人: "tensorlayer/srgan"srgan@noreply.github.com; 抄送: "YonghuiXu"2259949930@qq.com;"Author"author@noreply.github.com; 主题: Re: [tensorlayer/srgan] 关于#164、#165的问题的进一步研究与发现: 是tf.data.Dataset.from_generator导致的问题，有没有替代方案？ (#167)

Epoch: [0/20] step: [187/2] time: 0.309s, mse: 0.003 (218, 178) <== bug here, please check your data (218, 178, 3)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zsdonghao commented 5 years ago

If your images are 3D, the APIs would not return 2D images ...
If it happen, I can't help ...

tensorlayer / SRGAN

Input shape axis 0 must equal 4, got shape [3] #167