Closed robotzheng closed 4 years ago
Hi,
This might be caused by the change of torch.var() function. I see you are using pytorch 1.0.0. Could you check if the error can be solved by updating to 1.2.0 (I used for training) or the latest pytorch?
Thanks, updating to 1.1.0, I have fixed it. But it isn't convergent.
close fp16, change batch to 128.
the log is: PyTorch VERSION: 1.0.0 CUDA VERSION: 9.0.176 CUDNN VERSION: 7401 GPU TYPE: Tesla P40 Warning: if --fp16 is not used, static_loss_scale will be ignored. => creating aognet => Params (double-check): 12.373355M Warning: if --fp16 is not used, static_loss_scale will be ignored. Warning: if --fp16 is not used, static_loss_scale will be ignored. Warning: if --fp16 is not used, static_loss_scale will be ignored. => ! Weight decay applied to FeatNorm parameters Traceback (most recent call last): File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 774, in
main()
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 340, in main
cfg.dataaug.mixup_rate, cfg.dataaug.labelsmoothing_rate)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 475, in train
output = model(input)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/apex/parallel/distributed.py", line 560, in forward
result = self.module(*inputs, *kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/aognet.py", line 643, in forward
Traceback (most recent call last):
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 774, in
main()
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 340, in main
cfg.dataaug.mixup_rate, cfg.dataaug.labelsmoothing_rate)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/main_fp16.py", line 475, in train
output = model(input)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/apex/parallel/distributed.py", line 560, in forward
result = self.module(*inputs, *kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
y = self.stage0(y)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/aognet.py", line 643, in forward
result = self.forward(*input, kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
y = self.stage0(y)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, *kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
result = self.forward(input, kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/aognet.py", line 223, in forward
input = module(input)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
tnode_output = getattr(self, op_name)(tnode_tensor_op)
result = self.forward(*input, kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/aognet.py", line 223, in forward
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
tnode_output = getattr(self, op_name)(tnode_tensor_op)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, *kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 132, in forward
result = self.forward(input, kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 132, in forward
y = self.conv_norm_ac_2(y)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
y = self.conv_norm_ac_2(y)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 91, in forward
result = self.forward(*input, *kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 91, in forward
y = self.conv_norm(x)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
y = self.conv_norm(x)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 73, in forward
result = self.forward(*input, kwargs)
y = self.conv_norm(x)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_singlescale.py", line 73, in forward
y = self.conv_norm(x)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, *kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
result = self.forward(*input, kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_basic.py", line 176, in forward
input = module(input)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
y = self.attention_weights(x) # bxk # or use output as attention input
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, *kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_basic.py", line 176, in forward
result = self.forward(input, kwargs)
File "/home/zzt/AOGNets/AOGNet-v2/scripts/../tools/../models/aognet/operator_basic.py", line 147, in forward
var = torch.var(x, dim=(2, 3)).view(b, c, 1, 1)
TypeError: var() received an invalid combination of arguments - got (Tensor, dim=tuple), but expected one of: