Closed suryabhupa closed 7 years ago
Thanks for your interests. Because we expect adding the layer activations will eliminate some information. Explicitly keeping all of previous layers' activations provides more useful information for later layers. That's the major difference between DenseNets and ResNets.
Ah okay, this is what I suspected as well; do you think any computation saved by adding the layer activations would counteract the generality of concatenating the two?
Our experiments suggested well-designed transformations (See our paper, DenseNet-BC structure) performed on concatenated features can save parameters and computation, compared with ResNet.
Maybe there's a balance between adding and concatenating that can maximize the computation savings.
Ah okay, I see. That's great to hear! Thanks for the explanations (the paper is awesome!) 👍
I may be misunderstanding the architecture, but why does DenseNet decide to concatenate feature maps from the current layer to pass backward instead of using "true" residual connections?