Closed QW-Is-Here closed 5 years ago
I met the same problem. The training log show that the width is very very large.
@MrWangg1992 Hi, how do you solve the this problem? Can you share it with me?
@violet17 I changed the BCELoss command to BCEWithLogitsLoss
@MrWangg1992 Thanks. I changed it too, but it can't help. loss became nan.
@violet17 Did you changed the batch
and subdivison
size in the .cfg file as well ?
@MrWangg1992 I changed them to 1 because of CUDA always out of memory.
@violet17 你看看你的cfg配置文件选的是test还是train,batch和subdivisions等于1的时候是没有效果的. Out of memory 就只能调batch size 这些的调小点
@MrWangg1992 我把batch和subdivisions都设成1了, 不能这样么?为啥?
Traceback (most recent call last): File "sparsity_train.py", line 154, in
train()
File "sparsity_train.py", line 100, in train
loss = model(imgs, targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, kwargs)
File "/home/chensy/QW/yolov3-network-slimming/yolomodel.py", line 352, in forward
x, losses = self.module_list[i][0](x, targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, kwargs)
File "/home/chensy/QW/yolov3-network-slimming/yolomodel.py", line 133, in forward
loss_conf = self.bce_loss(pred_conf[conf_mask_false], tconf[conf_mask_false]) + \
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 512, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2113, in binary_cross_entropy
input, target, weight, reduction_enum)
RuntimeError: reduce failed to synchronize: device-side assert triggered