hustvl / GKT

Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer
https://arxiv.org/abs/2206.04584
MIT License
218 stars 18 forks source link

The difference with BEVFormer #2

Open tangtaogo opened 2 years ago

tangtaogo commented 2 years ago

Hi, Thanks for your open source work! But I have some doubts about the main differences between GKT and BEVFormer, except for the different selection of regions around the prior positions.

outsidercsy commented 2 years ago

Sorry for the late reply. GKT's kernel region is with fixed integer pixel coordinate and can be pre-computed. BEVFormer predicts offsets to sample points, similar to deformable-DETR. It's sampling positions are floating-point and dynamic.