why Conv 3x3x3 and Conv 1x1x1 are used in the 3D LKA Block？

xmindflow / deformableLKA

[WACV 2024] Beyond Self-Attention: Deformable Large Kernel Attention for Medical Image Segmentation

191 stars 14 forks source link

why Conv 3x3x3 and Conv 1x1x1 are used in the 3D LKA Block？ #31

Open xiaogege1210 opened 1 month ago

xiaogege1210 commented 1 month ago

Hello, could you please explain why Conv 3x3x3 and Conv 1x1x1 are used in the 3D LKA Block instead of continuing to use Layer Norm and MLP as in the 2D LKA Block?

xiaogege1210 commented 1 month ago

The different strategy for the 3D LKA that you mentioned in the paper refers to replacing LN (Layer Normalization) and MLP (Multi-Layer Perceptron) with 3x3x3 and 1x1x1 convolutions, right? Additionally, I noticed in your paper that a separate deformable convolution layer was introduced after the depth-wise convolution in the 3D LKA. Is there a structural diagram available to take a look at?