zehuichen123 / BEVDistill

[ICLR 2023] BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection
117 stars 6 forks source link

source code question #1

Open lemonc1014 opened 2 years ago

lemonc1014 commented 2 years ago

Hello! Will the source code be released? If so, is there a specific time

zehuichen123 commented 1 year ago

Thanks for your attention to our work. Since the experiments were done on two different BEVFormer codebases (inner codebase at SenseTime for single-frame and official codebase for multi-frame), we are going to merge the code into the official BEVFormer before releasing the code. Please expect the code after ICCV ddl :)

sujinjang commented 1 year ago

Hello! While I'm waiting for the source code release, I got some questions on the modification of the teacher model. In the paper, you mentioned that DGCNN attention is replaced with a vanilla multi-scale attention module and a pre-trained CenterPoint weight is used for an initialization. Could you please provide little bit more details here? For example, did you simply change "DGCNNAttn" with "MultiheadAttention" from the original voxel config (https://github.com/WangYueFt/detr3d/blob/main/projects/configs/obj_dgcnn/voxel.py#L88)? Also, which CenterPoint weight did you use?

Another question is on the detail on the "BEV Feature" which is extracted from the teacher model. The original implementation of obj_dgcnn inputs multi-scale features as like [128x128], [64x64], [32x32], [16x16] as a flattened feature of [21760x256] to the transformer_encoder ("DetrTransformerEncoder", https://github.com/WangYueFt/detr3d/blob/main/projects/configs/obj_dgcnn/voxel.py#L71). Then the transformer_encoder also outputs the memory of [21760x256]. How did you extract bev_feature [bev_w x bev_w x 256] from here? This seems to the most important part of the teacher model.

Li-Whasaka commented 1 year ago

Has the code been released yet?