Some questions about Axial-attention

plemeri / UACANet

Official PyTorch implementation of UACANet: Uncertainty Augmented Context Attention for Polyp Segmentation (ACMMM 2021)

MIT License

142 stars 37 forks source link

Some questions about Axial-attention #8

Closed Liqq1 closed 2 years ago

Liqq1 commented 2 years ago

Hi~I have some questions about Axial-attention Why there is no premute operation before view in mode h?

# for mode h
projected_query = self.query_conv(x).premute(0, 1, 3, 2).view(*view).permute(0, 2, 1)

I think premute is necessary. Although the shape of those values are correct to calculate，it has a very different meaning for mode h comparing to mode w. Without premute, the projected_query can't actually collect the columns to the dimension with size Hight For example: For mode W, the way of reshape is correct. Without permute for mode H, it is obviously not what we want: With permute for mode H,[0, 5, 10, 15] is the column of a.:

plemeri commented 2 years ago

I think you are right. Thank you for letting me know. I'll keep in mind this in my future works.

Liqq1 commented 2 years ago

Okay！And thanks for your great code work, this is a very clear template！

Asthestarsfalll commented 2 years ago

Any intention to re-train all model in this repo? @plemeri

plemeri commented 2 years ago

I'm not planning to since this isn't our main contribution, but thanks for letting me know that the authors of CaraNet seems to be using our code without any citation. I really feel bad about it.

Liqq1 commented 2 years ago

Hi，did you delete your comment？I can only see it in the email notification，not in the githun issue

发自我的iPhone

------------------ Original ------------------ From: Simon Diener @.> Date: Tue,Jul 26,2022 3:35 AM To: plemeri/UACANet @.> Cc: machine no learning @.>, Author @.> Subject: Re: [plemeri/UACANet] Some questions about Axial-attention (Issue#8)

It seems as if they don't even use the axial attention module they mention in the paper, as they apply a residual connection in the end and multiply the output of the axial attention module with zero. At least this is what my unknowing eyes see in their github repository.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

SimonServant commented 2 years ago

Hello, yes i had sadly misunderstood the usage of the attention module for CaraNet. I initially thought, that they implemented the code similar, but ended up not using the axial attention, misinterpreting the inital gamma value of the self attention layer and mixing up the implementaton with the reverse attention module. It is indeed correct, that the axial attention is used by CaraNet, unreferenced in the published paper. However i think they reference that fact in the official repository.

I am sry for disturbing you with that comment and wanted to delete it, once i realized my initial assumption to be wrong.

plemeri commented 2 years ago

I just noticed that the author of CaraNet updated their readme.