Is this a bug that channel between input tensor and sync batchnorm are mismatch the code still run successful?

lscelory commented 5 years ago

I wrote code like this in my net, net = nn.Sequential( nn.Conv2d(18*2, 18, kernel_size=3, padding=1, bias=True), SynchronizedBatchNorm2d(18*2), nn.ReLU(inplace=True),)

and I found the channel between conv layer output and sync batchnorm input are not match. (18 and 18*2), but the code is run successfully without warning or error. After I debug some code for this, it work when use multiply gpus in training, which means adding this lines in the code. net = nn.DataParallel(net, device_ids=[0,1]) patch_replication_callback(net) net = net.cuda()

If use this sync batch norm on one gpu, that will report error, RuntimeError: running_mean should contain 18 elements not 36 I don't know whether it is a bug. I found in the source code of sync batch norm, it will reshape the input tensor like input = input.view(input.size(0), self.num_features, -1). Is that means once this shape of tensor can be reshape by the code in this line, even the channels are mismatch, the batch norm process still work with no error?

vacancy commented 5 years ago

Thanks for reporting! I agree that we should add an explicit assertion before reshaping the input tensor. Will add it

vacancy commented 3 years ago

Resolved by 82d0ea9

vacancy / Synchronized-BatchNorm-PyTorch

Is this a bug that channel between input tensor and sync batchnorm are mismatch the code still run successful? #31