Closed liguopeng0923 closed 1 year ago
Gt may represents a smooth distance. The larger number is, the closer to the camera localization. But "gt_with_ori" is still hard to understand, especially for the multiplication of "ratio"
Hi Guopeng,
Yes. The gt
is a Gaussian smoothed localization heat map with the peak at the ground truth location.
Regarding gt_with_ori
, it is used for generating ground truth for contrastive learning, i.e. infoNCE loss. It generalizes the gt
to all orientation bins we defined.
We discretize location and orientation into a grid space of size NNR, where N is the spatial resolution and R means the 360 degrees orientation space is divided into R orientation bins. For contrastive learning, the objective is to pull the aerial descriptor at the correct location and orientation toward the ground descriptor, meanwhile, push aerial descriptors that have the wrong location or wrong orientation away from the ground descriptor.
gt_with_ori
has a probability at each location and orientation combination. The probability is high when both location and orientation are correct.
The ratio
is a smoothing factor in orientation space.
Suppose there are 20 orientation bins, 0 degrees, 18 degrees, 36 degrees, ..., 342 degrees, and the ground truth orientation is 30 degrees, we don't want to treat the ground truth location in the orientation channel of 36 degrees with the full weight. Instead, we downweigh it based on its distance in angular space to the ground truth orientation. In this case, the factor is (18-6)/18. Similarly, the ground truth location in the orientation channel of 18 degrees will have a weight of 6/18.
Does this answer your question?
Sorry for my later response.
This explanation perfectly answers my doubts!
Best wishes again.
Hi, Zimin
It's difficult to understand the construction about gt and gt_with_ori. Could you explain the operation?
Best, Guopeng