Closed Liqq1 closed 2 years ago
I think you are right. Thank you for letting me know. I'll keep in mind this in my future works.
Okay!And thanks for your great code work, this is a very clear template!
Any intention to re-train all model in this repo? @plemeri
I'm not planning to since this isn't our main contribution, but thanks for letting me know that the authors of CaraNet seems to be using our code without any citation. I really feel bad about it.
Hi,did you delete your comment?I can only see it in the email notification,not in the githun issue
发自我的iPhone
------------------ Original ------------------ From: Simon Diener @.> Date: Tue,Jul 26,2022 3:35 AM To: plemeri/UACANet @.> Cc: machine no learning @.>, Author @.> Subject: Re: [plemeri/UACANet] Some questions about Axial-attention (Issue#8)
It seems as if they don't even use the axial attention module they mention in the paper, as they apply a residual connection in the end and multiply the output of the axial attention module with zero. At least this is what my unknowing eyes see in their github repository.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hello, yes i had sadly misunderstood the usage of the attention module for CaraNet. I initially thought, that they implemented the code similar, but ended up not using the axial attention, misinterpreting the inital gamma value of the self attention layer and mixing up the implementaton with the reverse attention module. It is indeed correct, that the axial attention is used by CaraNet, unreferenced in the published paper. However i think they reference that fact in the official repository.
I am sry for disturbing you with that comment and wanted to delete it, once i realized my initial assumption to be wrong.
Hi~I have some questions about Axial-attention Why there is no premute operation before view in mode h?
I think premute is necessary. Although the shape of those values are correct to calculate,it has a very different meaning for mode h comparing to mode w. Without premute, the projected_query can't actually collect the columns to the dimension with size Hight For example: For mode W, the way of reshape is correct. Without permute for mode H, it is obviously not what we want: With permute for mode H,[0, 5, 10, 15] is the column of a.: