how much improvement of tta for bevfusion-transfusion-head

mit-han-lab / bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

https://bevfusion.mit.edu

Apache License 2.0

2.27k stars 409 forks source link

how much improvement of tta for bevfusion-transfusion-head #126

Closed AndyYuan96 closed 2 years ago

AndyYuan96 commented 2 years ago

Hi, I just wonder that for the ensemble model of bevfusion，did your guys use transfusion-head？I just wonder how much improvement of flip test augmentation for transfusion-head, as I transfusion-head can't use centerhead's flip test augmentation directly, and I did some experiment, I did't see too much improvement of flip test augmentation for transfusion-head compared with large improvement for centerhead.

kentang-mit commented 2 years ago

The improvement from flip TTA is around 1 mAP. Instead of averaging the feature maps (which is the approach taken by CenterPoint), we use NMS or weighted box fusion to combine the predictions from different inputs. Both will provide similar performance improvements.

AndyYuan96 commented 2 years ago

The improvement from flip TTA is around 1 mAP. Instead of averaging the feature maps (which is the approach taken by CenterPoint), we use NMS or weighted box fusion to combine the predictions from different inputs. Both will provide similar performance improvements. Thank you for kindly feedback. For NMS, did your guys use mmdet3d's function, for example like function: merged_bboxes = merge_aug_bboxes_3d(aug_bboxes, scale_img_metas,self.pts_bbox_head.test_cfg), and does nms need parameter tune, like nms_thresh and so on. merge_aug_bboxes_3d function is here: https://github.com/open-mmlab/mmdetection3d/blob/9556958fe1c6fe432d55a9f98781b8fdd90f4e9c/mmdet3d/models/detectors/centerpoint.py#L187

I use pillar backbone of transfuion, I use merge_aug_bboxes_3d to combine result, as merge_aug_bboxes_3d will convert filiped ouptut back to right coordinate, so I don't do any convert, and I don't see too much different improvement, only 0.1 point in MAP.

kentang-mit commented 2 years ago

I remember that the NMS threshold needs to be tuned. Also, you may try out circle NMS for nuScenes.

AndyYuan96 commented 2 years ago

I remember that the NMS threshold needs to be tuned. Also, you may try out circle NMS for nuScenes.

can I ask does other indicator also has large improvement like mAOE、mAVE？After tune nms parameter,mAP did improve, but nds don't have too much improvement.

To verify whether my nms function have problem, I use plain centerpoint to do a experiment, I compare the result of averaging the feature maps and merge boxes using nms, I find that, both of them give similar improvement in mAP indicator, but averaging the feature maps have larger improvement on mATE(-0.0097 vs 0.0050),mASE(-0.0070 vs -0.0014),mAOE(-0.0296 vs 0.0050),mAVE(-0.0210 vs -0.0052).

kentang-mit commented 2 years ago

For the rest of the terms, I think it is expected that the improvements are small.

jamesgunnfiveai commented 1 year ago

Hi @kentang-mit, just wondering if you could give some more information on some of the above (would be really helpful for me!), though I know it's been a while since this was implemented.

I remember that the NMS threshold needs to be tuned. Also, you may try out circle NMS for nuScenes. Do you recall what NMS threshold you ended up using for nuScenes in the end? Was it fixed, or task-specific? Did you use circle NMS?

Did you only use the double-flip and rotation augmentations for the TTA or did you also do scaling of the pointcloud? Did you combine the flip and rotation augmentations with each other, like in Centerpoint? If so, I think that means you are taking NMS/WBF over 20 different test time augmentations (4 for flip [no flip, x, y, xy], 5 for rotation [no rotation, +/- 6.25, +/-12.5]), is that right?

Many thanks for taking the time to respond to all the questions people post on your excellent work, it is really useful.