I noticed in the PVT-v2 code, that you use a linear projection after the pooling layer in the spatial reduction part of the attention?
I am wondering have you tried training the model without using a linear projection after the pooling layer in the spatial reduction part of the attention? Does it work or not?
I noticed in the PVT-v2 code, that you use a linear projection after the pooling layer in the spatial reduction part of the attention?
I am wondering have you tried training the model without using a linear projection after the pooling layer in the spatial reduction part of the attention? Does it work or not?