mit-han-lab / bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
https://bevfusion.mit.edu
Apache License 2.0
2.22k stars 405 forks source link

Visualization fails with CustomDatset - Need support with predictions coordinate frame #510

Closed AlexIlis closed 4 months ago

AlexIlis commented 11 months ago

Hello @kentang-mit,

I am using a custom dataset to train BEVFusion. Exactly as you mentioned previously, I implemented a custom data preprocessor class that essentially generates custom_infos.pkl in NuScenes annotation style. I'm able to train and test with that.

However, I won't be able to create a NuScenes object as it is not compatible. But I do want to render or visualize results (both BEV on lidar as well as 3D Boxes on camera frames). My results are currently super wonky and I wanted to better understand the following:

1) BEVFusion camera only model for object detection - what is the coordinate frame of the predictions and what is that of the ground truth ?

2) Outputs are size, translation and rotation but with reference to what? Is that represented in Birds eye View? Is there a script to visualize not in BEV but in perspective view of each camera ?

zhijian-liu commented 4 months ago

Adding support for a custom dataset is beyond the scope of this codebase, and unfortunately, we don't have the capacity to accommodate such customized requests.