mitmul / chainer-faster-rcnn

Object Detection with Faster R-CNN in Chainer
MIT License
288 stars 87 forks source link

Multiple losses #16

Closed paleckar closed 7 years ago

paleckar commented 7 years ago

Hi,

thanks for the code. I have a question regarding loss functions. It seems that you have implemented your own ParallelUpdater class to deal with multiple output losses of the r-cnn network (two for rpn and two for final classification and bbox regression).

What I don't understand is the reason for updating each loss individually. Doesn't this mean that the gradients of especially the lower layers (trunk etc.) are re-computed four times for each example, each time with different incoming upper gradient depending on the current loss branch? To me this seems very inefficient. When I modify the FasterRCNN class training-mode output to be a sum of the four losses, not a 4-tuple, and use the chainer's ParallelUpdater, it speeds up the training about 3x by experiments on GTX970, and I also do not get initial exp overflows in the beginning of the training.

Shouldn't there be a single combined loss defined as the sum of the four losses? This certainly is the case in the original(ish) tensorflow implementation

mitmul commented 7 years ago

@paleckar Hi, thank you for trying this code. We've released the cleaned version of Faster R-CNN inference & training codes in ChainerCV: https://github.com/pfnet/chainercv . So, this repository has been actually deprecated. Could you try the ChainerCV repo instead of this code? And please throw an issue in the ChainerCV repo when you face a problem :) Thank you!