nv-tlabs / lift-splat-shoot

Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV 2020)
Other
1.04k stars 217 forks source link

About the Splat module #1

Open Owen-Liuyuxuan opened 4 years ago

Owen-Liuyuxuan commented 4 years ago

Thanks for your great work. We have been quite eager to know more about this work. For me, it is still rather confusing how the network converts the (C,D,H,W) image features into the BEV image.

I found you mention both point-pillar oft paper. It is rather difficult for me to merge these two methods here (both conceptually and in code). And your video presents some codes on the splat part which is still quite confusing for me...

Would love to ask for some more detailed introduction or some snippets of codes about this?

Thank you.

maciej-autobon commented 3 years ago

@Owen-Liuyuxuan the secret lies in the binimgs label used in training, which is a variable of shape: (4, 1, 200, 200) and the "4" is the batch size here.

This is how it looks like:

Screenshot 2021-04-10 at 14 27 52

From the looks of it, it's a binary mask saying where the vehicles are? I've just started going through the code so don't take my word for it, try running the code yourself with the "mini" dataset.

maciej-autobon commented 3 years ago

Oh, right, and it's a top-down view which for me explains how "Lift, Splat, Shoot" has such good results (at least in my opinion).

EE102-JN commented 2 years ago

Hi man, have you understood the codes for projecting 2D images to 3D BEV, because I have same questions. How to make frustum shape reconstruction rather than cubic?

manueldiaz96 commented 2 years ago

Hi man, have you understood the codes for projecting 2D images to 3D BEV, because I have same questions. How to make frustum shape reconstruction rather than cubic?

Do you mean 3D shape reconstruction?