Closed Ehangiun closed 1 year ago
I'm not sure whether I understand your questions correctly, but are you saying that you want to visualize these projected images in the RGB space?
I'm not sure whether I understand your questions correctly, but are you saying that you want to visualize these projected images in the RGB space?
Thank you for your reply ,I means visualize these projected images in the brid's eye view space just likes project the image to a bird's eye view
I see, what we projected to the BEV space is actually camera features, not RGB pixels, so we do not have an intuitive visualization for that.
Because the accuracy of the camera-only of my test was low, I wanted to know what the camera looked like when it was in the bird 's-eye view ,I have tested the effect of a single image on the dataset, but when passing in internal and external parameters to my own camera, the accuracy drops dramatically
I see, in this case I would suggest you to visualize the predictions from camera-only models directly. We did that to verify whether we implemented the projections correctly on datasets other than nuScenes. We have a tools/visualize.py
to help you with that.
I see, in this case I would suggest you to visualize the predictions from camera-only models directly. We did that to verify whether we implemented the projections correctly on datasets other than nuScenes. We have a
tools/visualize.py
to help you with that. Thanks,this helps me solve the problem. By the way, I want to know how do you solve the mapping relationship between multiple cameras in the brid's eye view space ,like pic Is it a fixed Angle mapping or a separate mapping relationship for each camera?
It is a separate mapping relationship for each camera. This function will be very helpful for you to understand the mapping between camera and LiDAR coordinate systems: https://github.com/mit-han-lab/bevfusion/blob/0e5b9edbc135bf297f6e3323249f7165b232c925/mmdet3d/models/vtransforms/base.py#L79.
It is a separate mapping relationship for each camera. This function will be very helpful for you to understand the mapping between camera and LiDAR coordinate systems:
The function get_geometry, this is where I get confused, and here's what I think: when I get a frustum to map each pixel of the image, the rgb value of the original pixel is converted to the coordinate point, I am confused about this, is this my think mistake?
Sorry for the delayed response. We did not map RGB values to 3D, instead we project high-dimensional features to the BEV space. Hope that makes sense to you.
Thank you for your reply,I think I found my answer through the discussion!
Hi @Ehangiun @kentang-mit . as shown below, I find that the camera features in bev have no significant object information (e.g., boundary, contour) compared with lidar features, so my question is how can the camera branch learn the object features and use them to complete downstream task (e.g., detection, segmentation).
Hi @Ehangiun @kentang-mit . as shown below, I find that the camera features in bev have no significant object information (e.g., boundary, contour) compared with lidar features, so my question is how can the camera branch learn the object features and use them to complete downstream task (e.g., detection, segmentation).
After the model converges,we will get some significant feature with object information in the camera branch. for example, the visualized feature bellow is from the decoder of the camera branch.
Hello,thank you for your contributions at fusion After I read your code ,i find the function ' BaseDepthTransform' in 'bevfusion/mmdet3d/models/vtransforms/base.py'. 'get_geometry' seem to create a point cloud of data through the simulated image as your papar said,and variable 'depth' mean point cloud transform to image and get depth information ,but 'get_cam_feats' combines the image information with the depth information and outputs the shape (B,N,D,H,W,C) ,and then to downsampling with the shape of get_geometry. How can I get images projected from multiple cameras to bev