kkahatapitiya / X3D-Multigrid

PyTorch implementation of X3D models with Multigrid training.
MIT License
89 stars 13 forks source link

x3d.py #10

Open wcyy0123 opened 1 year ago

wcyy0123 commented 1 year ago

Add these codes to the file

if __name__=='__main__':
    net = generate_model('S',).cuda()
    #print(net)    
    from torchsummary import summary
    inputs = torch.rand(8, 3, 10, 112, 112).cuda()
    output = net(inputs)
    print(output.shape)
    summary(net,input_size=(3,10,112,112),batch_size=8,device='cuda')

The code can run success, but except the summary, The error report was

 File "x3d.py", line 382, in <module>
    summary(net,input_size=(3,10,112,112),batch_size=8,device='cuda')
  File "D:\software\program\Anaconda3\envs\pytorch1\lib\site-packages\torchsummary\torchsummary.py", line 72, in summary
    model(*x)
  File "D:\software\program\Anaconda3\envs\pytorch1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "x3d.py", line 324, in forward
    x = self.bn1(x)
  File "D:\software\program\Anaconda3\envs\pytorch1\lib\site-packages\torch\nn\modules\module.py", line 1128, in _call_impl
    result = forward_call(*input, **kwargs)
  File "x3d.py", line 52, in forward
    x = x.view(n // self.num_splits, c * self.num_splits, t, h, w)
RuntimeError: shape '[0, 192, 10, 56, 56]' is invalid for input of size 1505280

I found that the shape of x was (2,3,10,112,112) in the forwad other than (8,3,10,112,112), and I don`t konw why. Do you konw that?

kkahatapitiya commented 1 year ago

Sorry about the delay in response. X3D uses split batchnorm to compute batchnorm stats with a constant batch size, no matter the actual input batch size. To do so, the input batch size per gpu (BASE_BS_PER_GPU) should be a multiple of CONST_BN_SIZE. Please see here: https://github.com/kkahatapitiya/X3D-Multigrid/blob/d63d8fe6210d2b38aa26d71b0062b569687d6be2/train_x3d_kinetics_multigrid.py#L161C50-L161C95

wcyy0123 commented 1 year ago

The origin code like this:

   def forward(self, x):
        if self.training:
            n, c, t, h, w = x.shape
            x = x.view(n // self.num_splits, c * self.num_splits, t, h, w)
            x = self.split_bn(x)
            x = x.view(n, c, t, h, w)
        else:
            x = self.bn(x)
        if self.affine:
            x = x * self.weight.view((-1, 1, 1, 1))
            x = x + self.bias.view((-1, 1, 1, 1))
        return x

It is different from the x3d.py