toandaominh1997 / EfficientDet.Pytorch

Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch
MIT License
1.44k stars 306 forks source link

Speedup by direct allocation in focus loss #109

Open rmcavoy opened 4 years ago

rmcavoy commented 4 years ago

The below lines with torch.ones and torch.zeros in the focus loss are allocating arrays on one device and then transfering them to gpu using .cuda. This is inefficient especially in cases where DataParallel is over larger numbers of gpu's as the main process will have to move the data to each gpu individually.

targets = torch.ones(classification.shape) * -1
targets = targets.cuda()
alpha_factor = torch.ones(targets.shape).cuda() * alpha
cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, torch.zeros(cls_loss.shape).cuda())

Instead they should be allocated directly on the gpu using e.g. ones_like to copy all traits of the original array including their device. My tests with the below show an approximately 4x speedup on 6 GPU's using DataParallel (though I am also using a custom parallel dataloader so your performance may vary)

targets = torch.ones_like(classification) * -1
alpha_factor = torch.ones_like(targets) * alpha
cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, torch.zeros_like(cls_loss))