liuzhuang13 / DenseNet

Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).
BSD 3-Clause "New" or "Revised" License
4.69k stars 1.06k forks source link

Why 3 dense blocks, instead of downsampling #42

Open ozyilmaz opened 6 years ago

ozyilmaz commented 6 years ago

Hey there,

First of all, let me congratulate the authors. This is a very solid architecture that resembles cortical computation.

I have a question regarding the choice of dense blocks. Due to the spatial size of the feature maps, dense connections are partitioned into blocks, creating iso-resolution maps in each block and transition layers that downsample between blocks.

Another option would be getting rid of blocks, connecting every layer with every other layer regardless of spatial size by using downsampling when there is a resolution mismatch.

Is there an experimental (i.e. worse performance, overfitting) or computational (i.e. more parameters) reason for not reporting this?

Thanks, Ozgur

liuzhuang13 commented 6 years ago

Thanks for the question. Actually, the feature maps in the first dense block is pooled at the end of the block, at after a convolution, acts the input to the second block. So the second block can actually see the pooled feature maps from the first block, the only additional thing is the convolution, which we found to improve the accuracy.

ozyilmaz commented 6 years ago

Thanks for the answer. I got the approach better now. Convolution in the transition layer compresses the feature space, and probably prevents overfitting. This is of course at the expense of a more blurry deep supervision because the gradient information flow through shorter connections are dilated at these bottlenecks. Cheers, Ozgur