Closed xskxzr closed 2 years ago
Ya, I know that. It's told from the paper page 14:
This is different from the standard MBConv where the down-sampling is done by applying stride2 depthwise convolution to the inverted bottleneck hidden states. We later found using stride-2
depthwise convolution is helpful but slower when model is small but not so much when model scales
So it's another test-and-tell case. Both should work, and stride-2 DepthwiseConv
may work slower and a bit better, but I havn't tested applying strides=2
on first conv.
A parameter use_dw_strides
is added for CoAtNet
, which can set False
for using strides
on Conv2D
for MBConv
. Default is True
.
From eq (5) in the paper, strides=2 is used in the first conv layer down-sampling in MBConv.
However, in line 78-80 of coatnet.py strides=1 is used in the first conv while strides=strides is used in the depth conv.