Closed kuynzereb closed 9 months ago
SelfAttention
itself also looks suspicious. If input tensor is a square (i.e., H
equals W
), then mode='h'
and mode='w'
will lead to the very same output. These two modes are used in PAA_kernel
and PAA_d
modules, so I guess they are expected to produce different outputs.
Interesting, if you have time I suggest you can dig into https://github.com/plemeri/UACANet, which is where PAA_e and PAA_d from.
Hi, thanks for paying attention to our work.
We initially tried to solve a image resolution problem for Object Contextual Representation architecture which we have been used for the attention-based segmentation decoder and thought it might be useful for the other attention operations such as PAA
modules but empirically we found that it wasn't necessary. Those parameters are deprecated and no longer in use.
Got it, thank you!
Though the question about mode
parameter in SelfAttention
still remains. It doesn’t seem correct that mode=“h”
and mode=“w”
produce the same results for square images. Looks like this mode
parameter is used incorrectly.
Now the transformation applied to feature tensors looks like
axis = 1
if "h" in self.mode:
axis *= height
if "w" in self.mode:
axis *= width
view = (batch_size, -1, axis)
x = x.view(*view)
It is correct for mode="w"
but seems to be incorrect for mode="h"
. If I understand correctly, in that case it rather should be something like
x = x.permute(0, 1, 3, 2).reshape(*view) # view = (batch_size, -1, height)
That is, we should additionally transpose H
and W
dimensions.
You are once again right about the problem. This was actually came from our previous work UACANet - issue #8 but we do not found any meaningful performance difference at that time, so we didn't change at that moment.
If you're trying to do a research or developing your own work, you can either change it into the right source code or just leave it as is.
Ok, thanks!
Hi, thank you for a great work!
I was digging into the used NN architecture and found out that
base_size
andstage
parameters are not actually used insidePAA_e
andPAA_d
modules. They are used to initializestage_size
parameter which is then used inSelfAttention
module. And the thing is thatSelfAttention
does not actually use this parameter.I am just wondering, is it some sort of a bug and maybe the whole model may be further improved or these parameters simply should be not used in these modules?