jocicmarko / ultrasound-nerve-segmentation

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras
MIT License
939 stars 328 forks source link

network structure #28

Open xdtl opened 7 years ago

xdtl commented 7 years ago

I am a bit confused about the network structure when comparing it with what was originally proposed in the paper (Fig. 1). At the very bottom of the unet, the definition is as follows:

conv4 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(pool3)
conv4 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)
conv5 = Convolution2D(512, 3, 3, activation='relu', border_mode='same')(pool4)
conv5 = Convolution2D(512, 3, 3, activation='relu', border_mode='same')(conv5)
up6 = merge([UpSampling2D(size=(2, 2))(conv5), conv4], mode='concat', concat_axis=1)

So conv5 has dimension of (N, 512, H, W); "UpSampling2D(size=(2, 2))(conv5)": (N, 512, 2H, 2W) conv4 has dimension of (N, 256, 2H, 2W); "merge" actually merges (N, 512, 2H, 2W) with (N, 256, 2H, 2W), which results in dimension of (N, 768, 2H, 2W)...

As I understand, the merged two sets of features are supposed to have the same dimension, instead of one has 512 features and the other has 256 features. I wonder if I misunderstood something. Thanks!

jocicmarko commented 7 years ago
up6 = merge([UpSampling2D(size=(2, 2))(conv5), conv4], mode='concat', concat_axis=1)

You can see the mode='concat' parameter in merge layer, which concatenates the feature maps (so it's only important to match "width" and "height" dimensions). You can also try mode='sum', but for that one you would need the same number of feature maps in the merge layer (which leads to rethinking the architecture). Also, this architecture is just inspired by U-net, it does not attempt to reproduce the one from the paper.

PS. This repo is now switched to Keras 2 API, which removed the merge layer and added more obvious concatenate layer.

xdtl commented 7 years ago

Thanks for your reply! It's good to know the repo is now switched to Keras 2 API.

Yes, I agree different number of feature maps won't generate any errors when they are merged together, as long as "width" and "height" are matched. I am just curious if that difference from the original paper was caused by some performance considerations. I am trying to re-implement U-net using Caffe, but I still couldn't get comparable performance as your version... Anyway, thanks very much for the nice work!

szhitansky commented 7 years ago

Hello,

I have an other related issue (Keras2, TF):

---> 23 up6 = concatenate([Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same')(conv5), conv4], axis=3)

ValueError: Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 36, 36, 256), (None, 37, 37, 256)]

Any thoughts ? Thanks.

jocicmarko commented 7 years ago

Please take a look at https://github.com/jocicmarko/ultrasound-nerve-segmentation/issues/37#issuecomment-302683399, it points to the same problem you have.