lorenmt / mtan

The implementation of "End-to-End Multi-Task Learning with Attention" [CVPR 2019].
https://shikun.io/projects/multi-task-attention-network
MIT License
673 stars 109 forks source link

Some problem of the implementation of task-specific attention networks. #36

Closed Watebear closed 3 years ago

Watebear commented 3 years ago

Hello, I read your paper recently and thank you for sharing your codes. I tried to read the architecture mentioned in the paper, but I can't find the attention module. Your code' format is different with other common Pytorch code. In my understanding, you implement attention by 1x1 depth-wise conv, and then merged the knowledge. I is different with attention in NLP, for example, the Multi-head Attention, have a k\q\v vector. Would you please help me, thank you very much.

lorenmt commented 3 years ago

Yes, you are right. The attention module designed here is different from the attention module design in NLP task. We don't necessarily need to follow the attention module design with key and query components, as long as it follows the idea of "attended features", then it can be called attention-based methods.