Open hideakikuratsu opened 2 years ago
Hi @hello-friend1242954 Weight and bias are parameters in the BN layer (they are updated during the back propagation). Running mean and variance are calculated during the forward pass, that's why, I think, they are not considered as parameters (since they do not require gradient). https://d2l.ai/chapter_convolutional-modern/batch-norm.html#training-deep-networks
Thank you for the reply! I agree that we have to judge whether they are counted as parameters or not by considering if they require gradient or not, but they also undoubtedly take up some static memory/storage spaces, right? So I thought the definition of 'parameters' is very ambiguous. Thank you for clear explanation!!
The number of parameters of each module is calculated by following code, https://github.com/sovrasov/flops-counter.pytorch/blob/5f2a45f8ff117ce5ad34a466270f4774edd73379/ptflops/pytorch_engine.py#L110-L112 I used this code on torch.nn.BatchNorm2d like this
import torch
bn = torch.nn.BatchNorm2d(10)
sum(p.numel() for p in bn.parameters() if p.requires_grad)
Last line returns 20, but torch.nn.BatchNorm2d also has running (moving) mean and variance as parameters, doesn't it? so I thought the correct number of parameters on torch.nn.BatchNorm2d(10) is the number of weight parameters = 10 the number of bias parameters = 10 the number of running mean parameters = 10 the number of running var parameters = 10 that is, 10 * 4 = 40. so I'm appreciated if you explain this! thank you!