data format (Good Intro for Beginner)

TianhaoFu commented 2 years ago

Hi, thanks for your code.

when I was using your repo, i found the batch_data format are as follows:

dict_keys(['metadata', 'points', 'voxels', 'shape', 'num_points', 'num_voxels', 'coordinates', 'gt_boxes_and_cls', 'hm', 'anno_box', 'ind', 'mask', 'cat'])

can you explain what each item means? Besides, can you tell me how the voxels data generates? where is the corresponding code?

Also when i training my centerpoint based on pointpillars, i found that the input data are

data["features"], data["num_voxels"], data["coors"]

one of the example of the input data is

fetures: torch.Size([74222, 20, 5])
num_voxels: torch.Size([74222])
coors: torch.Size([74222, 4])

can you tell me how the features generates? where is the corresponding code? and what the meaning of num_voxels?

thanks! :)

tianweiy commented 2 years ago

can you explain what each item means? Besides, can you tell me how the voxels data generates? where is the corresponding code?

metadata': this contains infos for this frame (e.g. lidar path, frame id, etc..) points: lidar points, a list of N x 4+x data voxels: namely, voxels, N x 4+x (the first channel indicates which frame this voxel is in across the whole batch) shape: spatial shape of the voxel data num_points: how many voxels are in each frame num_voxels: how many points are inside each voxel coordinates: integer xyz coordinates of each voxel gt_boxes_and_cls: the training target (x, y, w, l ,w, h, theta, and object label) heatmap: the center heatmap described in the paper anno_box: the box map for training (it contains the target box info at each center location) it is generated here https://github.com/tianweiy/CenterPoint/blob/928dbb69485f63efc37aa14fcd8a2322cc802357/det3d/datasets/pipelines/preprocess.py#L419

ind: indices for the center. This one is used to extract box parameter prediction from the bev map (and compute loss with anno box during the training)

mask: we do some zero padding for boxes (e.g. you get 10 real boxes for example 1, 20 for example 2, we will pad both frames to 20 boxes for efficient batching). Mask indicates if it is a zero padded value or real value

cat: the category label of this anno_box

Besides, can you tell me how the voxels data generates

it is implemented as dynamic voxelization https://github.com/tianweiy/CenterPoint/blob/928dbb69485f63efc37aa14fcd8a2322cc802357/det3d/models/readers/dynamic_voxel_encoder.py#L8

Also when i training my centerpoint based on pointpillars, i found that the input data are

features are generated here https://github.com/tianweiy/CenterPoint/blob/928dbb69485f63efc37aa14fcd8a2322cc802357/det3d/datasets/pipelines/preprocess.py#L195

it is basically, 74222 pillars, and each pillar gets 20 lidar points (some are zero padded) and each points have 5 features (x, y, z, r, timestamp)

num_voxels as explained above is the number of valid points per voxels (or pillars) coors is the coordinate for each pillar

Hopefully, this helps.

TianhaoFu commented 2 years ago

Thanks for your reply. I carefully read the voxelization function

def voxelization(points, pc_range, voxel_size):

This function looks as if it is not in the form of a dynamic voxelization. Because its implementation simply traverses the point cloud step by step, generates the voxel index, and fills the point cloud's feature data into the new voxel matrix according to the index. @tianweiy

tianweiy commented 2 years ago

Because its implementation simply traverses the point cloud step by step, generates the voxel index, and fills the point cloud's feature data into the new voxel matrix according to the index.

Sorry, I don't get this. I think what you describe (and what I implemented) is dynamic voxelization. Do I misunderstand the concept?

TianhaoFu commented 2 years ago

I printed the input point cloud coordinate data and found it to be in this form:

tensor([[  0,   0, 277, 241],
        [  0,   0, 310, 283],
        [  0,   0, 260, 241],
        ...,
        [  3,   0, 193, 361],
        [  3,   0, 152, 112],
        [  3,   0, 383, 193]], device='cuda:0', dtype=torch.int32)

data = dict( features=voxels, num_voxels=num_points_in_voxel, coors=coordinates, batch_size=batch_size, input_shape=example["shape"][0], ) Why are the coordinates in 4 dimensions and what does each dimension mean?

Also I printed the input point cloud voxel data and found it to be in this form

tensor([[[ -2.8630,   4.2008,  -1.9721,  23.0000,   0.1499],
         [ -2.9000,   4.2069,  -1.8522,  33.0000,   0.1001],
         [ -2.9082,   4.3142,  -1.8927,  28.0000,   0.0000],
         ...,
         [ -2.9082,   4.3697,  -1.9095,  27.0000,   0.0000],
         [ -2.9958,   4.3997,  -1.7918,  12.0000,   0.1001],
         [ -2.8851,   4.2570,  -1.9805,  24.0000,   0.3003]],

        [[  5.5198,  10.9735,  -2.3002,   3.0000,   0.2500],
         [  5.4946,  10.9447,  -2.2990,   3.0000,   0.1499],
         [  5.5790,  10.8439,  -2.2948,   3.0000,   0.0498],
         ...,
         [  5.4573,  10.9144,  -2.2963,   3.0000,   0.0498],
         [  5.4343,  10.9814,  -2.3001,   3.0000,   0.1499],
         [  5.5965,  10.8180,  -2.2944,   3.0000,   0.0000]],

        [[ -2.9384,   0.9971,  -1.6486,  20.0000,   0.0498],
         [ -2.9434,   0.8410,  -1.8094,   9.0000,   0.2500],
         [ -2.9163,   0.9642,  -1.8186,   5.0000,   0.1001],
         ...,
         [ -2.9446,   0.8919,  -1.6328,  13.0000,   0.1499],
         [ -2.9377,   0.8296,  -1.7143,  10.0000,   0.0498],
         [ -2.9425,   0.9046,  -1.8151,   7.0000,   0.4004]],

        ...,

        [[ 21.1160, -12.5320,   0.5717,  15.0000,   0.1001],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         ...,
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000]],

        [[-28.6991, -20.6613,  -0.8174,  22.0000,   0.1499],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         ...,
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000]],

        [[-12.4745,  25.5616,  -3.3252,   4.0000,   0.2500],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         ...,
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,   0.0000,   0.0000]]], device='cuda:0')

data = dict( features=voxels, num_voxels=num_points_in_voxel, coors=coordinates, batch_size=batch_size, input_shape=example["shape"][0], )

By definition the voxel data is just a rearrangement of the point cloud data, so why is there a negative number, and what does each dimension of this mean? @tianweiy

TianhaoFu commented 2 years ago

Because its implementation simply traverses the point cloud step by step, generates the voxel index, and fills the point cloud's feature data into the new voxel matrix according to the index.

Sorry, I don't get this. I think what you describe (and what I implemented) is dynamic voxelization. Do I misunderstand the concept?

Sorry, I misunderstood, you are right. your implementation is the dynamic voxelization.

yuedi-hhh commented 5 months ago

您好我想请问，在det3d/models/readers/dynamic_vixel_encoder.py中的DynamicVoxelEncoder类和det3d/datasets/pipelines/process.py中的Voxelization类，二者在数据流动过程中会同时被使用吗？因为一个是readers，一个数据预处理的pipeline，二者都是体素化。我能否理解当使用 VoxelFeatureExtractorV3作为reader时，就需要用Voxelization进行数据预处理，如果使用DynamicVoxelEncoder作为reader就不要Voxelization数据预处理呢？因为我注意到V3没有把点云体素化。

我对于这个框架是怎么运转的还不明白，希望能得到你们的帮助，谢谢！我对于det3d的运作是这样理解的，使用了@READERS.register_module等装饰器，从而在import的过程中会将文件从头到尾运行，于是就将各个类注册到registry中，从而可以使用congfig的关键字就能build出各个类？请问我这么理解是对的吗？除了看源码，哪里还能了解到他的运行机制呢？源码太多了，一个一个看太复杂了，看不明白。

yuedi-hhh commented 5 months ago

@TianhaoFu 请问您能解决我的疑惑吗？

tianweiy / CenterPoint

data format (Good Intro for Beginner) #244