hustvl / GKT

Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer
https://arxiv.org/abs/2206.04584
MIT License
218 stars 18 forks source link

Questions about the composition of memory? #3

Open hutao568 opened 2 years ago

hutao568 commented 2 years ago

Hi, thank you very much for your article. I have doubts about some of the details in the article. GKT uses the BEV(HWD) reference point to obtain the vision token from the image to form a VSCKK feature, then the Decoder's memory is HWD VSC Is the token composed of the characteristics of KK?

hutao568 commented 2 years ago

您好,非常感谢您的文章。我对于文章中一些细节存在疑惑。 GKT利用BEV(HWD)参照点,去图像中获取vision token,组成一个VSCKK的特征,那么Decoder的memory是HWD 个VSCKK的特征组成的token吗