关于Attention Module的设计

lorenmt / mtan

The implementation of "End-to-End Multi-Task Learning with Attention" [CVPR 2019].

https://shikun.io/projects/multi-task-attention-network

MIT License

665 stars 108 forks source link

关于Attention Module的设计 #55

Closed puhan123 closed 2 years ago

puhan123 commented 2 years ago

您好，请问您论文中的Attention Module为什么要这样设计呢？是基于什么原理这样设计的呢？还是参考了现有文献？谢谢~~

lorenmt commented 2 years ago

Hello，请问文章有什么地方没写清楚么？我建议问一些更加细致的问题。

puhan123 commented 2 years ago

我的意思是为什么要用两个1×1 conv和1个3×3 conv组成，我没看到论文里有讲解这个。论文里只是说了作为attention masks can be considered as feature selectors and 3*3 kernels represent a shared feature extractor for passing to another attention module. 谢谢~ 还有一个问题，论文中关于attention mask的学习方式有这样一句表达：The attention mask is learned in a self-supervised fashion with back-propagation. 请问如何理解self-supervised fashion?

lorenmt commented 2 years ago

这里的 3 x 3还是 shared 的 feature，为了尽可能减少 task-specific parameter 数量也同时 avoid over-fitting。两个 1 x 1第一个是把上层 feature 和 shared 的 feature 融合，第二个是再加一层 non-linear 类让 map 到 0 - 1 的attention map

这里的 attention mask 纯粹由梯度更新所以是 self-supervised i.e. 没有外界的 supervision 告诉他怎么去变这个 mask。

puhan123 commented 2 years ago

非常感谢您的及时回复~~