yukitsuji / 3D_CNN_tensorflow

KITTI data processing and 3D CNN for Vehicle Detection
MIT License
284 stars 120 forks source link

raw_to_voxel function #29

Closed OneManArmy93 closed 5 years ago

OneManArmy93 commented 5 years ago

I was trying to understand the code but I could wrap my mind around the values of x, y z and resolution (resolution=0.50, x=(0, 90), y=(-50, 50), z=(-4.5, 5.5)). Can anyone explain this please and what's the point from them? thank you

ansabsheikh9 commented 5 years ago

x, y, z is the range of scene in which you are interested to detect vehicles. And resolution is the size of voxel.

OneManArmy93 commented 5 years ago

Thank you for your reply. So you are telling me that x, y, z dont have to do with the data in the .bin files?

ansabsheikh9 commented 5 years ago

@fouratoueslati x, y, z in the bin file represents each point in 3D space. XYZ in the poitntcloud is used to filter out the range of Lidar scan you are interested. Like X is for forward, (0, 90) means you are interested in from 0 meters to 90 meters in front of vehicle. Rest of pointcloud points will be filtered out. So, Using x=(0, 90), y=(-50, 50), z=(-4.5, 5.5) resolution=0.50, you will end up with a voxelgrid of size 900x1000x50 voxel. You can reduce this range to fit the voxelgrid into your gpu memory

OneManArmy93 commented 5 years ago

@ansabsheikh9 thank you for your explanation, I can see a bit more clearly now But can you elaborate a little bit on the rest of the function(like the use of: logic_x, voxel[velo[:, 0], velo[:, 1], velo[:, 2]] = 1 velo =((velo - np.array([x[0], y[0], z[0]])) / resolution).astype(np.int32) )? Much appreciated

ansabsheikh9 commented 5 years ago

@fouratoueslati As this deeplearning architecture is designed to takein voxelgrid. So first step is to convert raw pointcloud data to a voxel grid representation. So, this fucntion is converting conversion of pointcloud representation to voxelgrid representaiton. where, velo =((velo - np.array([x[0], y[0], z[0]])) / resolution).astype(np.int32) )? function making all pointclouds positive (Or you can say shifting the reference of pointcloud for voxel grid representaiton) as there are points in the pointcloud which are negative and this neuralnetwork can take in poitive values.

OneManArmy93 commented 5 years ago

@ansabsheikh9 you said that by choosing those values of x,y,z we get 90x1000x50 voxels; does this means each point cloud data (that has xyz coordinates) is tranformed to a single voxel?

ansabsheikh9 commented 5 years ago

@fouratoueslati if there is a pointcloud data within 0.5meters (which is resolution) it will be a single voxel, otherwise it will be empty. There can be many pointcloud within 0.5meters grid

OneManArmy93 commented 5 years ago

@ansabsheikh9 so basically if a voxel is present a binary indicator will be attributed to highlight its presence (1 for presence 0 for absence)? and does this line of code { voxel[velo[:, 0], velo[:, 1], velo[:, 2]] = 1 } refers to that? Thank you

ansabsheikh9 commented 5 years ago

@fouratoueslati yes

OneManArmy93 commented 5 years ago

@ansabsheikh9 thank you for your help

OneManArmy93 commented 5 years ago

@ansabsheikh9 hi, can I ask about the relationship between the coordinates of a point cloud data (x,y,z) and its representation in the 3D voxel grid? I just want to make sure that an accurate voxelization has happened to a particular point. thank you

ansabsheikh9 commented 5 years ago

@fouratoueslati You can calculate using this formula velo =((velo - np.array([x[0], y[0], z[0]])) / resolution).astype(np.int32) ) this will give you the index of the voxelgrid. In this equation x,y,z is the position of Lidar sensor.

OneManArmy93 commented 5 years ago

@ansabsheikh9 thank you. that's was very helpful. But i m also curious about the size of the edge of the Voxel it is always equal to 1 no matter is the resolution. Can you explain why and how it could be changed based on the resolution?

OneManArmy93 commented 5 years ago

@ansabsheikh9 hello again. any info you can provide me can really unstuck me. Much appreciated :)

ansabsheikh9 commented 5 years ago

@OneManArmy93 We can say that voxel is equivalent to a 3D pixel. Each voxel size is determined by resolution you are using. If you are interested in pointcloud range of (x, y, z) = (10, 10, 10)meters so that means using resolution of 0.5 our 3D voxel grid will have the size (100, 100, 100) voxels. Each Voxel is a 3D pixel rather than having range from 0-255, in this experiment it can be only 0 or 1 (Binary).
Here is a sample voxelized representation of a point cloud I hope it will help you to clear things.

Capture
OneManArmy93 commented 5 years ago

@ansabsheikh9 so the voxel edge which is equal to a 1( in this [picture]) can not be modified because it is binay indicator for wether the space is occupied or not? résolution10

ansabsheikh9 commented 5 years ago

@OneManArmy93 yes

OneManArmy93 commented 5 years ago

@ansabsheikh9 thank you! much appreciated