longcw / yolo2-pytorch

YOLOv2 in PyTorch
1.55k stars 420 forks source link

multi GPUs support? #82

Open psu1 opened 6 years ago

psu1 commented 6 years ago

when run with torch.nn.DataParallel(net).cuda(), there is "AttributeError: 'DataParallel' object has no attribute 'loss'".

After I change loss = net.loss to loss = net.module.loss, there is a error "TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType" at return self.bbox_loss + self.iou_loss + self.cls_loss

Do I need to rewrite the loss function outside "class Darknet19(nn.Module)"?

Any better idea?

longcw commented 6 years ago

Yes, you need to rewrite the loss function outside the model. DataParallel will duplicate your model to run on multiple gpus, so that you can not access a member variable of it.

feiyuelankuang commented 6 years ago

Hi, I am trying to run the code with multigpu and I have rewrite the loss function outsde the model. The training looks normal, however, when I try to test it, it gives a lot of negative APs, do you have any idea about the reason, Thanks!