Closed ruihangdu closed 6 years ago
Hi, You found a bug! The intention was that the decomposed layer should preserve the dilation and the stride of the original layer.
However, while the dilation should (and is) applied to all of the decomposed components, the stride should be applied only to the last one. i.e pointwise_s_to_r_layer , depthwise_vertical_layer and depthwise_horizontal_layer should have a stride of 1.
To speed it up a bit, since depthwise_horizontal_layer is followed by the last conv which is a 1x1 conv, it can also have the stride of the original layer.
My thought was that the first layer and the last layer are just 1x1 convolutions that project the number of channels of each feature map from S to R and R to T respectively. However, in this process, the spatial information of each feature map should be preserved by the projections. On the other hand, I see the depthwise layers as equivalent to the original convolution and therefore should inherent the stride and dilation from the original layer which is decomposed.
Only the last depthwise layer should inherit the stride, otherwise you will get an effect of applying a stride twice (the current bug in the code).
The dilation however should be applied in all the depthwise layers.
Fixed.
Hi Jacob, What is the rationale behind applying striding and dilation to the two pointwise layers in the CP decomposition? I tried to compress the first layer of AlexNet but the output feature map became smaller than before the decomposition and caused issues for the second maxpooling layer because the feature map eventually became too small.