lorenmt / mtan

The implementation of "End-to-End Multi-Task Learning with Attention" [CVPR 2019].
https://shikun.io/projects/multi-task-attention-network
MIT License
673 stars 109 forks source link

Question about the structure of the encoder_block_att. #59

Closed JiSuanJiDaWang closed 2 years ago

JiSuanJiDaWang commented 2 years ago

Hi! Really impressive work! I am trying to build the attention framework on the mobilenet. I have some problems about the structure of the encoder_block_att. I noticed the implementation of "encoder_block_att" which is a shared feature extractor described in your paper on SegNet and the Resnet is quite different. In the SegNet, it is just a 3*3Conv with batchnormalisation and pooling which is said on your paper. But in the Resnet, it is a more complicated structure with 3 convolution operations. I wander if the block should be different on different network. If so, how should design block on my network.

Thanks! Resnet: (encoder_block_att_1): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) )

SegNet: (0): Sequential( (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) )

lorenmt commented 2 years ago

Hello,

The design in ResNet was trying to save computation. The 3 x 3 kernel would be computationally heavy compared to using 2 1 x 1 convolutions for approximation.

Since MobileNets share a similar residual design, I would suggest to follow the design in MTANResNet.

JiSuanJiDaWang commented 2 years ago

Thanks!