Closed JiSuanJiDaWang closed 2 years ago
Hello,
The design in ResNet was trying to save computation. The 3 x 3 kernel would be computationally heavy compared to using 2 1 x 1 convolutions for approximation.
Since MobileNets share a similar residual design, I would suggest to follow the design in MTANResNet.
Thanks!
Hi! Really impressive work! I am trying to build the attention framework on the mobilenet. I have some problems about the structure of the encoder_block_att. I noticed the implementation of "encoder_block_att" which is a shared feature extractor described in your paper on SegNet and the Resnet is quite different. In the SegNet, it is just a 3*3Conv with batchnormalisation and pooling which is said on your paper. But in the Resnet, it is a more complicated structure with 3 convolution operations. I wander if the block should be different on different network. If so, how should design block on my network.
Thanks! Resnet: (encoder_block_att_1): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) )
SegNet: (0): Sequential( (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) )