I want to know why the location loss function is divided by negative samples number(num_neg) but the all sample (num_neg + num_pos),I really know the location loss is producted by negative samples, but implement in the paper is the divided all sample (num_neg + num_pos).
I want to know why the location loss function is divided by negative samples number(num_neg) but the all sample (num_neg + num_pos),I really know the location loss is producted by negative samples, but implement in the paper is the divided all sample (num_neg + num_pos).