xq141839 / DCSAU-Net

Elsevier-CIBM-2023: A deeper and more compact split-attention U-Net for medical image segmentation
https://www.sciencedirect.com/science/article/pii/S0010482523000914
Apache License 2.0
70 stars 15 forks source link

Head of PFC block, 3x3 or 7x7 conv? #6

Closed JohnMBrandt closed 2 years ago

JohnMBrandt commented 2 years ago

The PFC block code says

def __init__(self,channels, kernel_size=7):
        super(PFC, self).__init__()
        self.input_layer = nn.Sequential(
                    nn.Conv2d(3, channels, kernel_size, padding= kernel_size // 2),
                    nn.ReLU(inplace=True),
                    nn.BatchNorm2d(channels)

which implies that the head of the PFC block has a Conv2D with kernel size = 7

However, in the paper, it is noted that "Also, 3x3 convolution is added to the head of this module for down-sampling the input image and raising the channel because depthwise separable convolution shows degradation of performance on low-dimensional features.

Should the kernel_size for the input_layer be 3 or 7?

xq141839 commented 2 years ago

Hello, thanks for your attention to our work. The kernel_size should be 3x3. This is probably caused by copy and paste. And this convolution is only used to increase the channel. Down-sampling is completed by the pooling layer. Sorry for the typo.

JohnMBrandt commented 2 years ago

thanks! Also, in line with normal residual structure and best-practices around BatchNorm, I get better results by doing ReLU after batch normalization and after residual connections. convolution bias should also be set to False when batch norm is used (https://stats.stackexchange.com/questions/482305/batch-normalization-and-the-need-for-bias-in-neural-networks). e.g.

class PFC(nn.Module):
    def __init__(self,channels, kernel_size=7):
        super(PFC, self).__init__()
        self.input_layer = nn.Sequential(
                    nn.Conv2d(3, channels, 3, padding= kernel_size // 2, bias = False),
                    nn.BatchNorm2d(channels))
        self.depthwise = nn.Sequential(
                    nn.Conv2d(32, channels, kernel_size, groups=channels, padding= kernel_size // 2, bias = False),
                    nn.BatchNorm2d(channels))
        self.pointwise = nn.Sequential(
                    nn.Conv2d(32, channels, kernel_size=1, bias = False),
                    nn.BatchNorm2d(channels),
                    nn.ReLU(inplace=True))
        self.act = nn.ReLU()
    def forward(self, x):
        x = self.input_layer(x)
        residual = x
        x = self.depthwise(self.act(x))
        x += residual
        x = self.act(x)
        x = self.pointwise(x)
        return x
xq141839 commented 2 years ago

Many thanks for your contribution. Yes, we just use a common way to establish each convolutional layer. The network could show better performance by optimising the structure. The significance of the PFC module is to remind researchers and engineers to focus not only on the transformation of high-level semantic information but also on the feature extraction of low-level semantic information.