After generating the first set of predicted bounding boxes supervised by the first giou loss, these predicted boxes are partially detached by the Gradient Multiplier term. Looks like in all experiments the gradient_mul is set to .1 meaning 10% of the gradient propagates through the predicted boxes reformulated as deformable conv offsets relative to a gradient_mul setting of 1.
Is the idea to enable the Varifocal loss and second (refinement) giou loss to partially contribute to the learning of the offsets in addition to the supervision from the first giou loss?
Thanks for your interest. You understanding is correct. The star_dconv operation is inspired by RepPoins and you can find more information in that paper about the benefit of the additional supervision.
Fantastic work!
After generating the first set of predicted bounding boxes supervised by the first giou loss, these predicted boxes are partially detached by the Gradient Multiplier term. Looks like in all experiments the gradient_mul is set to .1 meaning 10% of the gradient propagates through the predicted boxes reformulated as deformable conv offsets relative to a gradient_mul setting of 1.
Is the idea to enable the Varifocal loss and second (refinement) giou loss to partially contribute to the learning of the offsets in addition to the supervision from the first giou loss?
Thank you!