Open xdtl opened 7 years ago
up6 = merge([UpSampling2D(size=(2, 2))(conv5), conv4], mode='concat', concat_axis=1)
You can see the mode='concat'
parameter in merge
layer, which concatenates the feature maps (so it's only important to match "width" and "height" dimensions). You can also try mode='sum'
, but for that one you would need the same number of feature maps in the merge
layer (which leads to rethinking the architecture).
Also, this architecture is just inspired by U-net, it does not attempt to reproduce the one from the paper.
PS. This repo is now switched to Keras 2 API, which removed the merge
layer and added more obvious concatenate
layer.
Thanks for your reply! It's good to know the repo is now switched to Keras 2 API.
Yes, I agree different number of feature maps won't generate any errors when they are merged together, as long as "width" and "height" are matched. I am just curious if that difference from the original paper was caused by some performance considerations. I am trying to re-implement U-net using Caffe, but I still couldn't get comparable performance as your version... Anyway, thanks very much for the nice work!
Hello,
I have an other related issue (Keras2, TF):
---> 23 up6 = concatenate([Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same')(conv5), conv4], axis=3)
ValueError: Concatenate
layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 36, 36, 256), (None, 37, 37, 256)]
Any thoughts ? Thanks.
Please take a look at https://github.com/jocicmarko/ultrasound-nerve-segmentation/issues/37#issuecomment-302683399, it points to the same problem you have.
I am a bit confused about the network structure when comparing it with what was originally proposed in the paper (Fig. 1). At the very bottom of the unet, the definition is as follows:
So conv5 has dimension of (N, 512, H, W); "UpSampling2D(size=(2, 2))(conv5)": (N, 512, 2H, 2W) conv4 has dimension of (N, 256, 2H, 2W); "merge" actually merges (N, 512, 2H, 2W) with (N, 256, 2H, 2W), which results in dimension of (N, 768, 2H, 2W)...
As I understand, the merged two sets of features are supposed to have the same dimension, instead of one has 512 features and the other has 256 features. I wonder if I misunderstood something. Thanks!