Open Yamameeee opened 2 years ago
yes, the number of sampling offsets for one reference point (i.e. one keypoint) is set as 1 for each head and feature level So there are 32 sampling offsets for one reference point.
It could be visualized like this. green is the reference point and other points are sampling points. (low attentions are dismissed)
In FIgure 3. in your PETR paper, each referencepoint has multi sampling points. But the shape of input
sampling_locations
forMultiScaleDeformableAttnFunction
is(bs, num_query, num_heads, num_levels, num_leypoints, 2)
.Did you set the number of sampling offsets for one reference point (i.e. one keypoint) as 1?
https://github.com/hikvision-research/opera/blob/a7cadf6ad3f60c6371d1659d1f9f08ba66ed06d2/opera/models/utils/transformer.py#L409