Multiple losses - Githubissues

Hi,

thanks for the code. I have a question regarding loss functions. It seems that you have implemented your own ParallelUpdater class to deal with multiple output losses of the r-cnn network (two for rpn and two for final classification and bbox regression).

What I don't understand is the reason for updating each loss individually. Doesn't this mean that the gradients of especially the lower layers (trunk etc.) are re-computed four times for each example, each time with different incoming upper gradient depending on the current loss branch? To me this seems very inefficient. When I modify the FasterRCNN class training-mode output to be a sum of the four losses, not a 4-tuple, and use the chainer's ParallelUpdater, it speeds up the training about 3x by experiments on GTX970, and I also do not get initial exp overflows in the beginning of the training.

Shouldn't there be a single combined loss defined as the sum of the four losses? This certainly is the case in the original(ish) tensorflow implementation

mitmul / chainer-faster-rcnn

Multiple losses #16