sovrasov / flops-counter.pytorch

Flops counter for convolutional networks in pytorch framework
MIT License
2.78k stars 309 forks source link

The number of parameters on BatchNormalization module #89

Open hideakikuratsu opened 2 years ago

hideakikuratsu commented 2 years ago

The number of parameters of each module is calculated by following code, https://github.com/sovrasov/flops-counter.pytorch/blob/5f2a45f8ff117ce5ad34a466270f4774edd73379/ptflops/pytorch_engine.py#L110-L112 I used this code on torch.nn.BatchNorm2d like this import torch bn = torch.nn.BatchNorm2d(10) sum(p.numel() for p in bn.parameters() if p.requires_grad) Last line returns 20, but torch.nn.BatchNorm2d also has running (moving) mean and variance as parameters, doesn't it? so I thought the correct number of parameters on torch.nn.BatchNorm2d(10) is the number of weight parameters = 10 the number of bias parameters = 10 the number of running mean parameters = 10 the number of running var parameters = 10 that is, 10 * 4 = 40. so I'm appreciated if you explain this! thank you!

morkovka1337 commented 2 years ago

Hi @hello-friend1242954 Weight and bias are parameters in the BN layer (they are updated during the back propagation). Running mean and variance are calculated during the forward pass, that's why, I think, they are not considered as parameters (since they do not require gradient). https://d2l.ai/chapter_convolutional-modern/batch-norm.html#training-deep-networks

hideakikuratsu commented 2 years ago

Thank you for the reply! I agree that we have to judge whether they are counted as parameters or not by considering if they require gradient or not, but they also undoubtedly take up some static memory/storage spaces, right? So I thought the definition of 'parameters' is very ambiguous. Thank you for clear explanation!!