hassony2 / kinetics_i3d_pytorch

Inflated i3d network with inception backbone, weights transfered from tensorflow
MIT License
523 stars 114 forks source link

I think it should be ceil_mode=True #4

Closed rimchang closed 6 years ago

rimchang commented 6 years ago

https://github.com/hassony2/kinetics_i3d_pytorch/blob/c2b54db2368e136abe414d24aacd508c37b333a9/src/i3dpt.py#L115

tensorflow SAME padding must ceil https://www.tensorflow.org/api_docs/python/tf/nn/pool If padding = "SAME": output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

when I ceil_mode=False, I observed

a = torch.autograd.Variable(torch.ones((1,3,16,300,150)), requires_grad=False) out = i3d(a)

torch.Size([1, 64, 8, 150, 75]) torch.Size([1, 64, 8, 75, 37]) torch.Size([1, 64, 8, 75, 37]) torch.Size([1, 192, 8, 75, 37]) torch.Size([1, 192, 8, 37, 18]) torch.Size([1, 256, 8, 37, 18]) torch.Size([1, 480, 8, 37, 18]) torch.Size([1, 480, 4, 18, 9]) torch.Size([1, 512, 4, 18, 9]) torch.Size([1, 512, 4, 18, 9]) torch.Size([1, 512, 4, 18, 9]) torch.Size([1, 528, 4, 18, 9]) torch.Size([1, 832, 4, 18, 9]) torch.Size([1, 832, 4, 18, 9]) torch.Size([1, 832, 4, 18, 9]) torch.Size([1, 1024, 4, 18, 9]) torch.Size([1, 1024, 4, 18, 9])

when I ceil_mode=True a = torch.autograd.Variable(torch.ones((1,3,16,300,150)), requires_grad=False) out = i3d(a)

torch.Size([1, 64, 8, 150, 75]) torch.Size([1, 64, 8, 75, 38]) torch.Size([1, 64, 8, 75, 38]) torch.Size([1, 192, 8, 75, 38]) torch.Size([1, 192, 8, 38, 19]) torch.Size([1, 256, 8, 38, 19]) torch.Size([1, 480, 8, 38, 19]) torch.Size([1, 480, 4, 19, 10]) torch.Size([1, 512, 4, 19, 10]) torch.Size([1, 512, 4, 19, 10]) torch.Size([1, 512, 4, 19, 10]) torch.Size([1, 528, 4, 19, 10]) torch.Size([1, 832, 4, 19, 10]) torch.Size([1, 832, 4, 19, 10]) torch.Size([1, 832, 4, 19, 10]) torch.Size([1, 1024, 4, 19, 10]) torch.Size([1, 1024, 4, 19, 10])

I think ceil_mode=True is correct

hassony2 commented 6 years ago

Hi @rimchang !

Thank you for looking into this. Could you specify to which module you pass the cell_mode = True option ?

(I am aware that I am not using tensorflow's exact same padding requirements as that would demand that I keep track of the input sizes during computation to modify the padding sizes in ConstantPad3d.)

Thank you :)

rimchang commented 6 years ago

In my i3d of mxnet version when ceil_mode=True(pooling_convention='full' in mxnet), logits are slightly diffrent

https://github.com/rimchang/kinetics-i3d-Mxnet/blob/master/out/I3D_MX_TF_full/imagenet_rgb.txt

RGB checkpoint restored RGB data loaded, shape= (1, 3, 79, 224, 224) Norm of logits: 85.477379

Top classes and probabilities 0.999999 26.7355 playing cricket 4.39603e-07 12.0981 playing kickball 3.59115e-07 11.8958 catching or throwing baseball 1.45475e-07 10.9922 catching or throwing softball 9.06659e-08 10.5194 shooting goal (soccer) 6.67626e-08 10.2133 hitting baseball 6.17041e-08 10.1345 golf putting 3.10445e-08 9.44762 throwing discus 2.93214e-08 9.39051 hurling (sport) 2.63212e-08 9.28257 triple jump 1.59309e-08 8.78046 jogging 1.41347e-08 8.66083 javelin throw 7.95743e-09 8.0863 hurdling 6.84504e-09 7.93572 golf driving 6.84073e-09 7.93509 skateboarding 6.32415e-09 7.85657 headbutting 6.00402e-09 7.80463 dodgeball 5.8482e-09 7.77833 long jump 5.72226e-09 7.75656 shot put 5.62225e-09 7.73893 marching

https://github.com/rimchang/kinetics-i3d-Mxnet/blob/master/out/I3D_MX_TF_valid/imagenet_rgb.txt

RGB checkpoint restored RGB data loaded, shape= (1, 3, 79, 224, 224) Norm of logits: 84.988197

Top classes and probabilities 0.999998 26.333 playing cricket 7.04675e-07 12.1675 catching or throwing baseball 6.08109e-07 12.0201 playing kickball 2.57066e-07 11.1591 catching or throwing softball 1.52612e-07 10.6376 hitting baseball 1.43862e-07 10.5786 shooting goal (soccer) 9.55542e-08 10.1694 golf putting 4.25563e-08 9.36056 hurling (sport) 4.23585e-08 9.3559 triple jump 3.88498e-08 9.26944 throwing discus 1.99232e-08 8.60162 javelin throw 1.78306e-08 8.49065 hurdling 1.32184e-08 8.19135 playing tennis 1.28483e-08 8.16295 golf driving 1.27824e-08 8.1578 jogging 9.6263e-09 7.87423 skateboarding 9.26079e-09 7.83552 long jump 9.12024e-09 7.82023 headbutting 8.12995e-09 7.70529 breakdancing 7.07769e-09 7.56668 hopscotch

hassony2 commented 6 years ago

Thank you for pointing this out ! I merged your pull request #5 and updated the models and readme accordingly :)