Still works when using pillar-based algorithm?

hyalvin commented 8 months ago

Hi, thanks for the nice work! I notice that when we mix two frames, the mixed frame might contain points whose label is not the same, whereas we need to always predict just one class for one pillar when using pillar-based segmentation algorithms, which I think might make the model confused. Therefore, I would like to ask that if the framework still works if using pillar-based segmentation algorithms? Hope for your kind reply, thanks!

ldkong1205 commented 8 months ago

Hi @hyalvin, thanks for asking this interesting question!

For the voxel-based methods like Cylinder3D, which is similar to the pillar-based methods you just mentioned, the voxelization process naturally contains precision errors, especially when the voxel size is not small enough. This is because they assign the same semantic label to all the points inside a voxel (after the majority voting).

In our case, yes, the mixing might lead to the situation that multiple points with different labels are aggregated within a close vicinity and are assigned to the same voxel. However, as shown empirically from the results in the paper, such situations might not be that frequent, and the overall segmentation accuracy after using mixing is much higher than the baseline.

We conjecture that this is because the mixing along with inclination, i.e. LaserMix, can (to a certain extent) preserve the scene consistency so that the mixed point clouds will not be bothered a lot by the problem mentioned above. Besides, using a smaller voxel size might help mitigate such confusion.

Hope the above answers your question. Please let me know if you have any insights on this!

hyalvin commented 8 months ago

Thanks for the quick and informative reply! However, I think pillar-based algorithms (such as PointPillars) might run into this situation frequently. There are two primary questions in the pillar-based case (in which the voxelization process will not split voxels along z-axis):

When should we apply the LaserMix augmentation, at the beginning of augmentation pipeline (like LaserMix -> RandomSceneFlip -> ..., i.e., before other random scene augmentations and voxelization process) or at the beginning of training process (i.e., after voxelization process) ? If the answer is the former, then I am afraid that much points who own different semantic label will be assigned to the same pillar in the mixing point cloud, whereas our model gets just one semantic prediction for one pillar. If the answer is the latter, then I am wondering how to mixing two pillars (from two point cloud) along with inclination?
Based on the answer of the first question, how to mixing the GT of labeled data and Pseudo Label of unlabeled data?

Hope for your reply, thanks!

ldkong1205 commented 8 months ago

Hi @hyalvin, glad to hear from you!

Regarding your questions:

I agree with you that the pillar-based methods tend to encounter your mentioned situation more frequently.
In your case where fewer foreign points with different semantics are expected to be included, it is better to perform voxelization before the LaserMix operation. By doing this, at least each voxel itself will not suffer from having too many semantics inside.
You can mix two pillars along with the inclination direction by:
- For each pillar, calculate the averaging point coordinates $(\hat{x}, \hat{y}, \hat{z})$ from all the points inside it;
- In a similar operation as the current LaserMix, split pillars into different areas based on $(\hat{x}, \hat{y}, \hat{z})$;
- Mixing the corresponding pillars from two LiDAR scans.
For the mix of GT and pseudo-label, I believe you can do the same procedure by just recording the voxel IDs when you are mixing the LiDAR points.
Additional suggestion: If you are keen to maintain the consistency of instances before and after mixing, you can first split the points in a LiDAR point cloud into ground and non-ground points using RANSAC, and then perform mixing only on the ground points.

Hope the above resolves your concern. Let me know if you have any follow-ups on this aspect!

hyalvin commented 8 months ago

Hi @hyalvin, glad to hear from you!

Regarding your questions:

I agree with you that the pillar-based methods tend to encounter your mentioned situation more frequently.

In your case where fewer foreign points with different semantics are expected to be included, it is better to perform voxelization before the LaserMix operation. By doing this, at least each voxel itself will not suffer from having too many semantics inside.

You can mix two pillars along with the inclination direction by:

For each pillar, calculate the averaging point coordinates (x^,y^,z^) from all the points inside it;

In a similar operation as the current LaserMix, split pillars into different areas based on (x^,y^,z^);

Mixing the corresponding pillars from two LiDAR scans.

For the mix of GT and pseudo-label, I believe you can do the same procedure by just recording the voxel IDs when you are mixing the LiDAR points.

Additional suggestion: If you are keen to maintain the consistency of instances before and after mixing, you can first split the points in a LiDAR point cloud into ground and non-ground points using RANSAC, and then perform mixing only on the ground points.

Hope the above resolves your concern. Let me know if you have any follow-ups on this aspect!

Make sense! Thanks a lot for your kind and timely reply!

ldkong1205 commented 8 months ago

Very welcome! Feel free to reach out if you have other questions.

ldkong1205 / LaserMix

Still works when using pillar-based algorithm? #14