Closed wanghao14 closed 3 years ago
You can try BS_UPSCALE=4
and GPUS=1
. If you still can't fit it in the memory, reduce BS_UPSCALE
further, but, in this case I think you have to reduce CONST_BN_SIZE
by the same factor to use pretrained weights for the batchnorm layers. Otherwise you'll have to initialize the batchnorm layers from scratch.
Yes, x3d_multigrid_kinetics_fb_pretrained.pt
is the weights ported from FAIR implementation, which is trained with a longer schedule and gives a better pretrained accuracy.
@kkahatapitiya Thanks for your reply! I had tried BS_UPSCALE=4
and GPUS=1
but there was an error:
RuntimeError: Error(s) in loading state_dict for ResNet:
size mismatch for bn1.split_bn.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for bn1.split_bn.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for layer1.0.bn1.split_bn.running_mean: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([216]).
size mismatch for layer1.0.bn1.split_bn.running_var: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([216]).
size mismatch for layer1.0.bn2.split_bn.running_mean: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([216]).
size mismatch for layer1.0.bn2.split_bn.running_var: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([216]).
size mismatch for layer1.0.bn3.split_bn.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for layer1.0.bn3.split_bn.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for layer1.0.downsample.1.split_bn.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for layer1.0.downsample.1.split_bn.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([96]).
size mismatch for layer1.1.bn1.split_bn.running_mean: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([216]).
size mismatch for layer1.1.bn1.split_bn.running_var: copying a param with shape torch.Size([54]) from checkpoint, the shape in current model is torch.Size([216]).
......
So I modified BS_UPSCALE to 1 and the error disappeared.
Hi, thanks a lot for sharing your implementation! I want to use your pretrained model to do validation, and if I only have one GPU, how should I modify the super-parameters, especially base_bn_splits used in generate_model. And I want to know whether the model named "x3d_multigrid_kinetics_fb_pretrained.pt" is modified from the provided model by facebook? Looking forward to your reply.