iVMCL / AOGNet-v2

Official implementation of AOGNets and Attentive Normalization.
Other
45 stars 8 forks source link

run on 4 Tesla P40s, report: didn't match because some of the keywords were incorrect: dim #1

Closed robotzheng closed 4 years ago

robotzheng commented 4 years ago

close fp16, change batch to 128.

the log is: PyTorch VERSION: 1.0.0 CUDA VERSION: 9.0.176 CUDNN VERSION: 7401 GPU TYPE: Tesla P40 Warning: if --fp16 is not used, static_loss_scale will be ignored. => creating aognet => Params (double-check): 12.373355M Warning: if --fp16 is not used, static_loss_scale will be ignored. Warning: if --fp16 is not used, static_loss_scale will be ignored. Warning: if --fp16 is not used, static_loss_scale will be ignored. => ! Weight decay applied to FeatNorm parameters Traceback (most recent call last): File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 774, in main() File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 340, in main cfg.dataaug.mixup_rate, cfg.dataaug.labelsmoothing_rate) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 475, in train output = model(input) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/usr/local/python3/lib/python3.6/site-packages/apex/parallel/distributed.py", line 560, in forward result = self.module(*inputs, *kwargs) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/aognet.py", line 643, in forward Traceback (most recent call last): File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 774, in main() File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 340, in main cfg.dataaug.mixup_rate, cfg.dataaug.labelsmoothing_rate) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 475, in train output = model(input) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/usr/local/python3/lib/python3.6/site-packages/apex/parallel/distributed.py", line 560, in forward result = self.module(*inputs, *kwargs) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call y = self.stage0(y) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/aognet.py", line 643, in forward result = self.forward(*input, kwargs) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call y = self.stage0(y) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward result = self.forward(input, kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/aognet.py", line 223, in forward input = module(input) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call tnode_output = getattr(self, op_name)(tnode_tensor_op) result = self.forward(*input, kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/aognet.py", line 223, in forward File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call tnode_output = getattr(self, op_name)(tnode_tensor_op) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 132, in forward result = self.forward(input, kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 132, in forward y = self.conv_norm_ac_2(y) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call y = self.conv_norm_ac_2(y) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 91, in forward result = self.forward(*input, *kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 91, in forward y = self.conv_norm(x) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call y = self.conv_norm(x) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 73, in forward result = self.forward(*input, kwargs) y = self.conv_norm(x) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 73, in forward y = self.conv_norm(x) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward result = self.forward(*input, kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_basic.py", line 176, in forward input = module(input) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call y = self.attention_weights(x) # bxk # or use output as attention input File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_basic.py", line 176, in forward result = self.forward(input, kwargs) File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_basic.py", line 147, in forward var = torch.var(x, dim=(2, 3)).view(b, c, 1, 1) TypeError: var() received an invalid combination of arguments - got (Tensor, dim=tuple), but expected one of:

xilaili commented 4 years ago

Hi,

This might be caused by the change of torch.var() function. I see you are using pytorch 1.0.0. Could you check if the error can be solved by updating to 1.2.0 (I used for training) or the latest pytorch?

robotzheng commented 4 years ago

Thanks, updating to 1.1.0, I have fixed it. But it isn't convergent.