Shouldn't bottlenecks go in the transition layers ?

talolard commented 7 years ago

Hello, Thanks for this implementation. I'm trying to follow along and don't understand a fine point.

In the paper they put the bottlenecks as part of the transition layers, whereas you placed the bottlenecks in each internal layer of each block

layers

I suspect that having them in the transition layers is the correct approach, since the point of the bottleneck is to reduce the size of the accrued feature maps due to concatenation. Within each internal layer we aren't accruing much, and I suspect that having the bottlenecks there actually increases the number of parameters as the size of each feature map is less than 4*growth_rate.

ikhlestov commented 7 years ago

Hi @talolard ! Thanks for interesting in DenseNets! You may note that inside attached picture we have DenseBlock parts that consist of [1x1 conv and 3x3 conv]. As I understood from paper bottleneck layers should be added prior every 3x3 conv, if required. I've followed this part of paper:

and some of already provided source code, like this. So we have bottleneck layers really prior to concatenation(concatenation exist only at DenseBlocks, not at the Transition Layers), but also prior 3x3 conv. And another 1x1 conv we have at the Transition Layers prior to the average pooling:

At the end we have 1x1 conv layers as in the DenseBlocks as in TransitionLayers.

Regarding number of parameters - yes, every 1x1 layer increase number of training params. But in case of usage bottleneck layers we divide total number of layers by factor of 2. Say in usual DenseNet we have 40 layers(each 3x3 conv), but in DenseNet-BC we will have only 40/2 = 20 layers(each 1x1 + 3x3 conv). With this approach total numbers of training params will decrease.

talolard commented 7 years ago

Thanks for getting back to me on that. While your interpertation seems to be right, I ran a quick experiment. I used densenet as an encoder/decoder for sentances, replacing pooling with "deconvolutions" in the decoder to auto encode sentences. Running with bottlnecks only between blocks reduces parameters by 30% and greatly increases training speed. Attached, my loss graph for training: Orange - bottleneck after every layer Purple - bottleneck between blocks only. losscomp

ikhlestov commented 7 years ago

Hi @talolard again! Your experiment seems to be quite interesting use case, thank you for reporting. I think this is not bug regarding implementation, so I will close the issue. But I also try approach with bottleneck layers only between blocks for my future research, thanks for pointing my attention to it!

ikhlestov / vision_networks

Shouldn't bottlenecks go in the transition layers ? #2