open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.18k stars 1.52k forks source link

corresponding relationship between data and model architecture #108

Closed linchunmian closed 4 years ago

linchunmian commented 4 years ago

Hi, author. I confuse that when I use a multi-mudal 3D detector, how the mmdetection3d can allocate the data to corresponding architecture, i.e RGB image to 2D bacbone, and points to 3D backbone? What part of code can I refer to? BTW, could you consider to introduce the implementation of pseudo signals, i.e pseudo LiDAR? I think, the idea of fusing multiple modalities would be great, and the format of data modality can also be various. Thanks in advance for your help.

xavierwu95 commented 4 years ago

MVXNet is a multi-modal detector which contains a 2d detector and a 3d detector. As for pseudo LiDAR, you can take a proposal at our roadmap https://github.com/open-mmlab/mmdetection3d/issues/16. And it will be in our schedule some day.

linchunmian commented 4 years ago

Thanks. But I want to know how to search for the corresonding relationship between two modalities and their detector. For instance, if I want to check the data pipneline of image, where could I find the whole process. Sorry for my odd problem. I am a newer of mmdetection.

linchunmian commented 4 years ago

Please give me some instructions that how to find the corresponding relation between data and their architecture. For example, MVXNet, the image-point fused detector, how to ensure that the image feeds into the ResNet backbone, and simultaneously the point inputs the voxel encoder? In the config file, I just see the definitions of layers and data pipeline, but how to associate the data with their corresponding architecture? Please help me, and any help would be appreciated a lot!

ZwwWayne commented 4 years ago

The relationship between the two modalities is defined by the KITTI dataset, i.e., extrinsic parameters of the camera and LiDAR sensor. There usually be a transformation matrix that defines the coordinates transformation from the camera to LiDAR. This transformation could map the LiDAR points to the image pixels. When training or inference a model, we usually process these two data with different augmentations and record their transformations. Therefore, we could always build a mapping from a LiDAR points to image pixels.

This function provides an example that finds image features for LiDAR points.

linchunmian commented 4 years ago

Hi @ZwwWayne, thank you Also, I have gradually understood the overall logic of mmdetection3d. But anyway, thanks a lot for your kind.