stride and dilation in pointwise layers

jacobgil / pytorch-tensor-decompositions

PyTorch implementation of [1412.6553] and [1511.06530] tensor decomposition methods for convolutional layers.

https://jacobgil.github.io/deeplearning/tensor-decompositions-deep-learning

278 stars 63 forks source link

stride and dilation in pointwise layers #2

Closed ruihangdu closed 6 years ago

ruihangdu commented 6 years ago

Hi Jacob, What is the rationale behind applying striding and dilation to the two pointwise layers in the CP decomposition? I tried to compress the first layer of AlexNet but the output feature map became smaller than before the decomposition and caused issues for the second maxpooling layer because the feature map eventually became too small.

jacobgil commented 6 years ago

Hi, You found a bug! The intention was that the decomposed layer should preserve the dilation and the stride of the original layer.

However, while the dilation should (and is) applied to all of the decomposed components, the stride should be applied only to the last one. i.e pointwise_s_to_r_layer , depthwise_vertical_layer and depthwise_horizontal_layer should have a stride of 1.

To speed it up a bit, since depthwise_horizontal_layer is followed by the last conv which is a 1x1 conv, it can also have the stride of the original layer.

ruihangdu commented 6 years ago

My thought was that the first layer and the last layer are just 1x1 convolutions that project the number of channels of each feature map from S to R and R to T respectively. However, in this process, the spatial information of each feature map should be preserved by the projections. On the other hand, I see the depthwise layers as equivalent to the original convolution and therefore should inherent the stride and dilation from the original layer which is decomposed.

jacobgil commented 6 years ago

Only the last depthwise layer should inherit the stride, otherwise you will get an effect of applying a stride twice (the current bug in the code).

The dilation however should be applied in all the depthwise layers.

jacobgil commented 6 years ago

Fixed.