I have a question about sampling offset in the pose decoder.

hikvision-research / opera

A Unified Toolbox for Object Perception & Application

Apache License 2.0

154 stars 14 forks source link

I have a question about sampling offset in the pose decoder. #12

Open Yamameeee opened 2 years ago

Yamameeee commented 2 years ago

In FIgure 3. in your PETR paper, each referencepoint has multi sampling points. But the shape of input sampling_locations for MultiScaleDeformableAttnFunction is (bs, num_query, num_heads, num_levels, num_leypoints, 2).

Did you set the number of sampling offsets for one reference point (i.e. one keypoint) as 1?

https://github.com/hikvision-research/opera/blob/a7cadf6ad3f60c6371d1659d1f9f08ba66ed06d2/opera/models/utils/transformer.py#L409

dae-sun commented 2 years ago

yes, the number of sampling offsets for one reference point (i.e. one keypoint) is set as 1 for each head and feature level So there are 32 sampling offsets for one reference point.

dae-sun commented 2 years ago

It could be visualized like this. green is the reference point and other points are sampling points. (low attentions are dismissed)