Input format for EMD and Chamfer Loss

pclucas14 / lidar_generation

Code for "Deep Generative Models for LiDAR Data"

79 stars 20 forks source link

Input format for EMD and Chamfer Loss #2

Closed michaelsaxon closed 5 years ago

michaelsaxon commented 5 years ago

Hey, thanks for releasing this code! Just had a couple questions about the expected input format to the EMD module. It seems to be checking for a shape of (batch, 2, a, b) for polar or (batch, 3, a, b) for xyz based distance representation. Could you help me understand what the a and b are representing in these formats? In other words, given some set of polar coordinates which variable represents angle and which distance? And are you encoding anything positionally in the vector? Thanks!

pclucas14 commented 5 years ago

Hi Michael,

Basically instead of using a (num_points, 3) point cloud, I order the points into a (H, W, 3) tensor, where H x W = num_points. The reason we do this ordering is that by having a grid where nearby points are close one another, we can apply convolutions on this grid. If you are just looking for a pytorch EMD implementation, I would have a look at this one. I have not tried it put it seems pytorch 1.0 compatible, whereas I'm not sure if the one I shared is.

Hope this helps! -Lucas

michaelsaxon commented 5 years ago

Ok, so in that case, if you are ordering as HxWx3, why do you need the 3rd dimension to be of size 3? Don't you only need HxW to represent a full point cloud spherically? Are those the three bytes of (distance byte 1, distance byte 2, reflectivity) that are returned in a packet by the Lidar? Or something else? Thank you for the quick response, and for pointing me toward that other repo!

pclucas14 commented 5 years ago

You raise an interesting point. for Kitti Lidar, you actually need two scalars (and not three) for every x,y,z coordinate. If you think of the right triangle formed between the point (0,0,0) and (x,y,z), you could store d=sqrt(x2 + y2) and z. If all points lied on a plane (e.g. z==0 for all points), then only a single scalar would suffice.

The reason I have HxWx3, is because I'm storing the x, y, and z coordinates in the channels. The motivation behind that design choice is that EMD operates on (x,y,z) coordinates, therefore it made sense (at least to me) to require the input to be (x,y,z) coordinates.

spoiler alert: even though (x,y,z) and (d,z) contain the same amount of information (when placed on a HxW grid), we show in the paper that using (x,y,z) is more robust to added noise in the input.

Let me know if you have any other question -Lucas