zhijian-liu / torchprofile

A general and accurate MACs / FLOPs profiler for PyTorch models
https://pypi.org/project/torchprofile/
MIT License
560 stars 38 forks source link

the result of macs is not accurate #11

Closed Ironteen closed 3 years ago

Ironteen commented 3 years ago

I test a simple model as following:

    import torch
    import torch.nn as nn
    import torch.nn.functional as F

    from torchsummary import summary
    from torchprofile import profile_macs

    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv1 = nn.Conv2d(1, 10, kernel_size=5, padding=2, bias=False)
            self.bn1 = nn.BatchNorm2d(10)

            self.conv2 = nn.Conv2d(10, 10, kernel_size=3, padding=1, bias=False)
            self.bn2 = nn.BatchNorm2d(10)

        def forward(self, x):
            residual = x
            x = F.relu(self.bn1(self.conv1(x)))
            x = F.relu(self.bn2(self.conv2(x)))
            x += residual
            return x

    if __name__=="__main__":
        model = Net().cuda()
        # count parameters
        summary(model, (1, 28, 28)) 
        # count FLOPs
        inputs = torch.randn(1, 1, 28, 28).cuda()
        flops = profile_macs(model, inputs)
        print(f"FLOPs : {flops}") 

And the results are

      ----------------------------------------------------------------
              Layer (type)               Output Shape         Param #
      ================================================================
                  Conv2d-1           [-1, 10, 28, 28]             250
             BatchNorm2d-2           [-1, 10, 28, 28]              20
                  Conv2d-3           [-1, 10, 28, 28]             900
             BatchNorm2d-4           [-1, 10, 28, 28]              20
                     Net-5           [-1, 10, 28, 28]               0
      ================================================================
      Total params: 1,190
      Trainable params: 1,190
      Non-trainable params: 0
      ----------------------------------------------------------------
      Input size (MB): 0.00
      Forward/backward pass size (MB): 0.30
      Params size (MB): 0.00
      Estimated Total Size (MB): 0.31
      ----------------------------------------------------------------

      FLOPs : 901600

Obviously, the results only count the macs of convolution layer by 10x28x28x5x5x1+10x28x28x3x3x10, but ignores the macs of batch normalization layer and shortcut addition.

zhijian-liu commented 3 years ago

Batch normalization can be merged to convolution during the inference time, which is the common practice. I don't take the addition into account as it should not be considered as MACs.

Ironteen commented 3 years ago

Thank you for your reply.