The gap between the grid representation and the point representation.

xinge008 / Cylinder3D

Rank 1st in the leaderboard of SemanticKITTI semantic segmentation (both single-scan and multi-scan) (Nov. 2020) (CVPR2021 Oral)

Apache License 2.0

857 stars 180 forks source link

The gap between the grid representation and the point representation. #8

Closed jialeli1 closed 3 years ago

jialeli1 commented 3 years ago

The return of this function (Asymm_3d_spconv) is still a dense representation of SparseConvTensor, but how do we restore it to a point-wise representation? https://github.com/xinge008/Cylinder3D/blob/a1cc7a75fc8b99cb0886d83a4901ef4bf58ace3f/network/segmentator_3d_asymm_spconv.py#L249 However, I found in this line that the return value of the model is directly used as the predicted label. Shouldn't the predicted label be point-wise? The gap between the grid representation and the point representation makes me feel strange. https://github.com/xinge008/Cylinder3D/blob/a1cc7a75fc8b99cb0886d83a4901ef4bf58ace3f/train_cylinder_asym.py#L96 Could you give me some hints? Thanks.

xinge008 commented 3 years ago

For the voxel-to-point mapping, you can use the 'train_grid' to perform the transformation, which saves the point-voxel hashtable.
The voxel-wise output can be used as the final output too, via utilizing the prediction of voxel as the prediction of points inside corresponding voxel.

jialeli1 commented 3 years ago

Thanks for your reply. I should go to learn something about the label format of SemanticKITTI to understand the second point in your answer.

jialeli1 commented 3 years ago

Hi. In my understanding, the final segmentation prediction output is represented as (dim_r, dim_rho, dim_z)=(480, 360, 32) in the tensor y (as following), to represent a point cloud of range (0\~50, -pi\~pi, 2\~-4) ? https://github.com/xinge008/Cylinder3D/blob/a1cc7a75fc8b99cb0886d83a4901ef4bf58ace3f/network/segmentator_3d_asymm_spconv.py#L306 Thank you for telling me if I understand correctly.

yanx27 commented 3 years ago

@xinge008 Hi, I also have a question here. In your paper, there are voxel-wise and point-wise losses, however, in this code seems both of the two losses use voxels as inputs.

xinge008 commented 3 years ago

@jialeli1 Yes, the output of voxel can be transformed into the point-wise representation, which represents the prediction of whole point cloud. You can check the validation code and more details are shown in the paper

xinge008 commented 3 years ago

@yanx27 current preliminary codebase does not include the point refinement. Hence, only voxel-wise loss is involved, which consists of wce and lovasz-softmax.

jialeli1 commented 3 years ago

@jialeli1 Yes, the output of voxel can be transformed into the point-wise representation, which represents the prediction of whole point cloud. You can check the validation code and more details are shown in the paper

Thank you a lot. I found this in section 3.6 of the paper. However, I have another question. Is the grid resolution enough to finely represent such a large-scale point cloud space? In fact, in the cubic voxel method such as SPVNAS, the voxel size is set to 0.05m, which results in a dimension of 100/0.05=2000 if it is 100m.

xinge008 commented 3 years ago

@jialeli1 In our attempts, the smaller grid doesn't indicate the better performance, which also incurs the difficulty of optimization and larger cost of memory.

yanx27 commented 3 years ago

@yanx27 current preliminary codebase does not include the point refinement. Hence, only voxel-wise loss is involved, which consists of wce and lovasz-softmax.

Thanks for your reply! Looking forward your full preliminary codebase.