switchablenorms / Switchable-Normalization

Code for Switchable Normalization from "Differentiable Learning-to-Normalize via Switchable Normalization", https://arxiv.org/abs/1806.10779
867 stars 132 forks source link

Nan error caused by “N X C X 1 X 1” input features #5

Open PkuRainBow opened 6 years ago

PkuRainBow commented 6 years ago

Update! I have found the error is related to the input shape of the SW. When the input shape is NXCX1X1, the output of the SW will become NAN. The nn.BatchNorm can deal with NXCX1X1 correctly. I guess it is caused by that you compute the variance of the single value when you compute the variance of IN. Hope that can help you fix this bug.

Really cool work. I am trying to use the SN for segmentation tasks with your imagenet pretrained resnet50 (ResNet50v2+SN(8,32)-77.57.pth ) to initialize the backbone. I have added an decoder like ASPP module and the ASPP module contains several randomly initialized SN layer. I find the features before the PSP module is OK but become NAN after passing the SN module.

Really strange, I hope you could help me to solve this problem.

I am wondering the difference between resnet50v1+sn and resnet50v2+sn, is this problem related to the choice of the backbone network?

Here I provide the details of my usage for the ASPP module,

class SN_ASPPModule(nn.Module):
    """
    Reference: 
        Deeplabv3, combine the dilated convolution with the global average pooling.
    """
    def __init__(self, features, out_features=512, dilations=(12, 24, 36), using_moving_average=True):
        super(SN_ASPPModule, self).__init__()
        self.using_moving_average = using_moving_average

        self.conv1 = nn.Sequential(AdaptiveAvgPool2d((1,1)),
                                   nn.Conv2d(features, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv2 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv3 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[0], dilation=dilations[0], bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv4 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[1], dilation=dilations[1], bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))
        self.conv5 = nn.Sequential(nn.Conv2d(features, out_features, kernel_size=3, padding=dilations[2], dilation=dilations[2], bias=False),
                                   SwitchNorm(out_features, using_moving_average=self.using_moving_average))

        self.bottleneck = nn.Sequential(
            nn.Conv2d(out_features * 5, out_features, kernel_size=1, padding=0, dilation=1, bias=False),
            SwitchNorm(out_features, using_moving_average=self.using_moving_average),
            nn.Dropout2d(0.1)
            )
pluo911 commented 6 years ago

In fc layer, IN and LN should be the same.

R50v2+SN converges much faster than R50v1+SN and produces better top-5 acc.

PkuRainBow commented 6 years ago

@pluo911 It seems that you miss to solve the NAN bug when the H=W=1.

Yudian777 commented 6 years ago

@PkuRainBow it seems to be a bug in pytorch, if the input size is N, C, 1, 1, var_in is ‘nan’ instead of 0.

nachiket273 commented 6 years ago

I faced the same issue while running switch norm on CIFAR10, where IN variance is NAN. Using torch.var without Bessel's correction (unbiased=False) for IN var calculation did the trick for me,