Open Owen-Liuyuxuan opened 4 years ago
@Owen-Liuyuxuan the secret lies in the binimgs label used in training, which is a variable of shape: (4, 1, 200, 200) and the "4" is the batch size here.
This is how it looks like:
From the looks of it, it's a binary mask saying where the vehicles are? I've just started going through the code so don't take my word for it, try running the code yourself with the "mini" dataset.
Oh, right, and it's a top-down view which for me explains how "Lift, Splat, Shoot" has such good results (at least in my opinion).
Hi man, have you understood the codes for projecting 2D images to 3D BEV, because I have same questions. How to make frustum shape reconstruction rather than cubic?
Hi man, have you understood the codes for projecting 2D images to 3D BEV, because I have same questions. How to make frustum shape reconstruction rather than cubic?
Do you mean 3D shape reconstruction?
Thanks for your great work. We have been quite eager to know more about this work. For me, it is still rather confusing how the network converts the (C,D,H,W) image features into the BEV image.
I found you mention both point-pillar oft paper. It is rather difficult for me to merge these two methods here (both conceptually and in code). And your video presents some codes on the splat part which is still quite confusing for me...
Would love to ask for some more detailed introduction or some snippets of codes about this?
Thank you.