Closed JohnMBrandt closed 2 years ago
Hello, thanks for your attention to our work. The kernel_size should be 3x3. This is probably caused by copy and paste. And this convolution is only used to increase the channel. Down-sampling is completed by the pooling layer. Sorry for the typo.
thanks! Also, in line with normal residual structure and best-practices around BatchNorm, I get better results by doing ReLU after batch normalization and after residual connections. convolution bias should also be set to False when batch norm is used (https://stats.stackexchange.com/questions/482305/batch-normalization-and-the-need-for-bias-in-neural-networks). e.g.
class PFC(nn.Module):
def __init__(self,channels, kernel_size=7):
super(PFC, self).__init__()
self.input_layer = nn.Sequential(
nn.Conv2d(3, channels, 3, padding= kernel_size // 2, bias = False),
nn.BatchNorm2d(channels))
self.depthwise = nn.Sequential(
nn.Conv2d(32, channels, kernel_size, groups=channels, padding= kernel_size // 2, bias = False),
nn.BatchNorm2d(channels))
self.pointwise = nn.Sequential(
nn.Conv2d(32, channels, kernel_size=1, bias = False),
nn.BatchNorm2d(channels),
nn.ReLU(inplace=True))
self.act = nn.ReLU()
def forward(self, x):
x = self.input_layer(x)
residual = x
x = self.depthwise(self.act(x))
x += residual
x = self.act(x)
x = self.pointwise(x)
return x
Many thanks for your contribution. Yes, we just use a common way to establish each convolutional layer. The network could show better performance by optimising the structure. The significance of the PFC module is to remind researchers and engineers to focus not only on the transformation of high-level semantic information but also on the feature extraction of low-level semantic information.
The PFC block code says
which implies that the head of the PFC block has a Conv2D with kernel size = 7
However, in the paper, it is noted that "Also, 3x3 convolution is added to the head of this module for down-sampling the input image and raising the channel because depthwise separable convolution shows degradation of performance on low-dimensional features.
Should the
kernel_size
for theinput_layer
be 3 or 7?