Open lemonc1014 opened 2 years ago
Thanks for your attention to our work. Since the experiments were done on two different BEVFormer codebases (inner codebase at SenseTime for single-frame and official codebase for multi-frame), we are going to merge the code into the official BEVFormer before releasing the code. Please expect the code after ICCV ddl :)
Hello! While I'm waiting for the source code release, I got some questions on the modification of the teacher model. In the paper, you mentioned that DGCNN attention is replaced with a vanilla multi-scale attention module and a pre-trained CenterPoint weight is used for an initialization. Could you please provide little bit more details here? For example, did you simply change "DGCNNAttn" with "MultiheadAttention" from the original voxel config (https://github.com/WangYueFt/detr3d/blob/main/projects/configs/obj_dgcnn/voxel.py#L88)? Also, which CenterPoint weight did you use?
Another question is on the detail on the "BEV Feature" which is extracted from the teacher model. The original implementation of obj_dgcnn inputs multi-scale features as like [128x128], [64x64], [32x32], [16x16] as a flattened feature of [21760x256] to the transformer_encoder ("DetrTransformerEncoder", https://github.com/WangYueFt/detr3d/blob/main/projects/configs/obj_dgcnn/voxel.py#L71). Then the transformer_encoder also outputs the memory of [21760x256]. How did you extract bev_feature [bev_w x bev_w x 256] from here? This seems to the most important part of the teacher model.
Has the code been released yet?
Hello! Will the source code be released? If so, is there a specific time