mit-han-lab / bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
https://bevfusion.mit.edu
Apache License 2.0
2.24k stars 407 forks source link

Regarding different lidar scan pattern #167

Closed VeeranjaneyuluToka closed 1 year ago

VeeranjaneyuluToka commented 1 year ago

Hi, The current pre-trained model is based on NuScenes dataset which means that lidar pattern is from what they have used as mentioned here (https://www.nuscenes.org/nuscenes). How about having my own lidar which is having different scan pattern and apply transfer learning. Does that feasible? if it is feasible, is there any accuracy degradation etc..?

Thanks, Veeru.

kentang-mit commented 1 year ago

Hi @VeeranjaneyuluToka,

I think if you have enough labeled data, it is unnecessary to do transfer learning and you can directly train from scratch on your new data. Direct transfer might not work well.

Best, Haotian

VeeranjaneyuluToka commented 1 year ago

Hi @kentang-mit ,

Thanks for your quick reply.

Was wondering would this pattern really matters as we convert to voxel in voxelize. If i understand correctly, each voxel contains a point (x,y,z) and am not sure if the scanning pattern gets retained after voxelize.

Thanks, Veeru.

kentang-mit commented 1 year ago

Hi Veeru,

I believe scanning pattern still matters. The reason is that the voxel sizes we use are relatively small (if you visualize the point cloud, you will find that the results before and after voxelization look almost the same). In this case I think scanning patterns will be retained after voxelization and domain gaps will exist.

Best, Haotian

VeeranjaneyuluToka commented 1 year ago

@kentang-mit , thanks for your quick reply. I have one more question which is related to stereo vision. How about having a stereo vision rather than single camera, does that improve accuracy further?

I do not see Stereo vision + lidar fusion in automotive domain, does it because of the setup difficulties in stereo vision or something etc? wanted to get your view on the same.

Thanks, Veeru.

kentang-mit commented 1 year ago

You can check out a recent paper called BEVStereo. From my perspective, I think stereo vision might improve the camera-only models significantly, but for fusion models the improvement might be limited because the LiDAR branch is very strong.

kkangshen commented 1 year ago

Hi @kentang-mit How to display the real 3Dbox and predicted 3Dbox in a point cloud instead of a bird's eye view of the point cloud

kentang-mit commented 1 year ago

Hi @kkangshen,

We haven't tried this type of visualization in this codebase. The reason is that installing 3D visualization libraries (e.g. mayavi or pptk) can be challenging. You probably can have a look at the open source repos of earlier papers, for example, Frustum PointNet or PointRCNN (OpenPCDet). I believe they should have released this.

Best, Haotian

kkangshen commented 1 year ago

@kentang-mit , thanks for your quick repl。 I am using tools/visualize.py to save the predict bbox and groud truth bbox results, and use the method in OpenPCDet to display the results as follows, where blue is the real bbox, green is the predicted bbox, and the real bbox and predicted bbox are found in z There is an angular error on the axis.The result is shown in Figure 1

figure 1 image

I remove bboxes[..., 2] -= bboxes[..., 5] / 2 in tools/visualize.py, the result is shown in Figure 2, please ask 1 bboxes[..., 2] in the code What does -= bboxes[..., 5] / 2 do. 2 Why does the effect of Figure 1 appear when bboxes[..., 2] -= bboxes[..., 5] / 2 is not deleted

elif args.mode == "pred" and "boxes_3d" in outputs[0]:
    bboxes = outputs[0]["boxes_3d"].tensor.numpy()
    scores = outputs[0]["scores_3d"].numpy()
    labels = outputs[0]["labels_3d"].numpy()

    if args.bbox_classes is not None:
        indices = np.isin(labels, args.bbox_classes)
        bboxes = bboxes[indices]
        scores = scores[indices]
        labels = labels[indices]

    if args.bbox_score is not None:
        indices = scores >= args.bbox_score
        bboxes = bboxes[indices]
        scores = scores[indices]
        labels = labels[indices]

    # bboxes[..., 2] -= bboxes[..., 5] / 2
    bboxes = LiDARInstance3DBoxes(bboxes, box_dim=9)

figure 2 image

kentang-mit commented 1 year ago

@kkangshen,

The visualizations look really cool. For bboxes[..., 2] -= bboxes[..., 5] / 2, I think it is related to the bounding box coordinate system. Basically the origin of this coordinate system might sit on the center of the box, or the top surface, or the bottom surface, depending on the convention of different libraries (e.g. mmdetection3d and OpenPCDet might have different preferences). In this case, when you are using different libraries, you might need to adjust the z prediction of all bounding boxes to make the visualizations consistent.

Best, Haotian

kkangshen commented 1 year ago

mmdetection3d and OpenPCDet

@kentang-mit thanks for your quick reply, The center position of the 3D box in mmdetection3d and OpenPCDet is different

VeeranjaneyuluToka commented 1 year ago

@kkangshen , I believe you might have used mayavi, if it is so, how did you manage to import. When i try to import, the notebook kernel is dying and it is getting restarted. Would be great help if you can tell the approach that you followed. FYI, i have had a look into this issue as well (https://github.com/enthought/mayavi/issues/439). I have tried with open3d as well, it is hanging in this case. BTW, i am using 3090ti nvidia gpu card.

kentang-mit commented 1 year ago

@VeeranjaneyuluToka There's one way that potentially helps: you may generate the predictions and save the point clouds on your server and try installing these packages locally on a laptop / workstation with GUI support.

VeeranjaneyuluToka commented 1 year ago

@kentang-mit , Thanks for your reply. One more thing, I am trying to understand the data reading pipeline like data conversion (creating .pkl files in this case), then getting processed and generate GT data. Would you mind sharing if there is any flow or a brief description of the different module interaction? I think i lost when it is called build_dataloader() which internally somehow calls the nuscenes_dataset.py and how it calls the pipeline scripts etc.. Looks like this is highly dependent on the mmcv registry APIs.Thanks!

kentang-mit commented 1 year ago

Hi @VeeranjaneyuluToka,

The high-level logic for data preparation is as follows:

1) You first generate some info files (.pkl files you mentioned). The target is to gather all the information you needed in the dataloader (everything present in the arguments of the Collect3D pipeline stage: https://github.com/mit-han-lab/bevfusion/blob/0e5b9edbc135bf297f6e3323249f7165b232c925/configs/nuscenes/default.yaml#L170) from the dataset. The purpose of doing that is to accelerate the loading of these information during training / testing.

2) You build the GT database. In this step, we will assume that the dataset file has been implemented and the info files have been created, just as you saw. What you will do is to load the 3D box annotations and point clouds in each scene, and extract those points that lies in each of the ground truth 3D boxes. The collection of such (box, points_in_the_box, box_class)s will form a "GT database".

I would say that (1) is necessary for any new dataset you're going to add, but you can skip (2) to get started. In this case, you can remove the GT paste augmentation here in the configuration: https://github.com/mit-han-lab/bevfusion/blob/0e5b9edbc135bf297f6e3323249f7165b232c925/configs/nuscenes/default.yaml#L81. (2) Is only useful when there are some data imbalance issues on your datasets.

Hope that explanation will help and I'll be glad to provide further elaborations.

Best, Haotian

kentang-mit commented 1 year ago

Closed due to inactivity.