Background should have been considered in the loss function.

First I want to thank the author for this implementation. I find the overall codes are clear and easy to follow than many others.

I think that the background should have been considered when calculating the focal loss. In fact, it is exactly the great number of anchors assigned with the background (which are negative samples) that motivates the design of focal loss. Therefore, the below implementation is not correct. https://github.com/kuangliu/pytorch-retinanet/blob/2d7c663350f330a34771a8fa6a4f37a2baa52a1d/loss.py#L29-L31

Please correct me if I'm wrong or miss anything.

kuangliu / pytorch-retinanet