ldkong1205 / LaserMix

[CVPR 2023 Highlight] LaserMix for Semi-Supervised LiDAR Semantic Segmentation
https://ldkong.com/LaserMix
Apache License 2.0
272 stars 17 forks source link

About the baseline cylinder3D #25

Closed ybc-ybc closed 6 months ago

ybc-ybc commented 6 months ago

Hi @ldkong1205!

Very great job.

I have some questions that need your help! Why does the baseline cylinder3D not perform the same in MMdet3D and PCSeg, but both exceed the original paper, do they have any difference, is it due to data augmentation?

Which one can I use if I choose the backbone?

Thanks, Looking forward to your reply!

ldkong1205 commented 6 months ago

Hi @ybc-ybc, thanks for your interest in this work.

As mentioned in our paper, we used a larger voxel size for the Cylinder3D option in our framework. This is to reduce the computation consumption since our framework requires 2 times more memory during the training.

The implementation of Cylinder3D in either the MMDetection3D or the PCSeg codebases follows the original Cylinder3D implementation in setting up the voxel size. There are several differences of course, such as the use of spconv 2.0 (instead of spconv 1.2.1), LaserMix and PolarMix data augmentations, and so on. Therefore, the reproduced results are slightly higher than the original reported scores.

I would suggest you try MinkUNet as the backbone by following the MMDetection implementation. You can find the config file at https://github.com/open-mmlab/mmdetection3d/tree/main/mmdet3d/configs/minkunet. From our recent experiments, the MinkUNet model achieves a performance that is on par with Cylinder3D, while consuming much less memory during the training.

Additionally, if you want to pursue even better semi-supervised learning performance, I would recommend you to try FRNet, a range-view LiDAR segmentation network that achieved a promising balance between accuracy and efficiency (e.g., FRNet achieves a FPS of 29.1 on SemanticKITTI, compared to 6.2 of Cylinder3D).

ybc-ybc commented 6 months ago

Hi @ybc-ybc, thanks for your interest in this work.

As mentioned in our paper, we used a larger voxel size for the Cylinder3D option in our framework. This is to reduce the computation consumption since our framework requires 2 times more memory during the training.

The implementation of Cylinder3D in either the MMDetection3D or the PCSeg codebases follows the original Cylinder3D implementation in setting up the voxel size. There are several differences of course, such as the use of spconv 2.0 (instead of spconv 1.2.1), LaserMix and PolarMix data augmentations, and so on. Therefore, the reproduced results are slightly higher than the original reported scores.

I would suggest you try MinkUNet as the backbone by following the MMDetection implementation. You can find the config file at https://github.com/open-mmlab/mmdetection3d/tree/main/mmdet3d/configs/minkunet. From our recent experiments, the MinkUNet model achieves a performance that is on par with Cylinder3D, while consuming much less memory during the training.

Additionally, if you want to pursue even better semi-supervised learning performance, I would recommend you to try FRNet, a range-view LiDAR segmentation network that achieved a promising balance between accuracy and efficiency (e.g., FRNet achieves a FPS of 29.1 on SemanticKITTI, compared to 6.2 of Cylinder3D).

Thank you very much for your reply.

Previously, I was mainly focused on indoor point clouds, and recently, I've been reading about LiDAR point clouds. I've noticed that these types of backbone networks are larger in size and have longer training times. Efficiency and precision are both important. You provide very useful information, very nice.

Keep in touch, thanks!

ldkong1205 commented 6 months ago

Glad to hear! Feel free to open a new issue if you have further questions.

ybc-ybc commented 5 months ago

Glad to hear! Feel free to open a new issue if you have further questions.

@ldkong1205,

Hi, sorry to bother you, I have two small queries :

  1. Referring to PCSeg and MMDetection , when experimenting with backbone networks (in spconv, MinkowskiEngine or torchsparse), does setting a fixed seed ensure reproduction of the training results?
  2. I've found that the results fluctuate a lot under the torchsparse library. Which sparse convolutional library would you choose?
ldkong1205 commented 5 months ago

Glad to hear! Feel free to open a new issue if you have further questions.

@ldkong1205,

Hi, sorry to bother you, I have two small queries :

  1. Referring to PCSeg and MMDetection , when experimenting with backbone networks (in spconv, MinkowskiEngine or torchsparse), does setting a fixed seed ensure reproduction of the training results?
  2. I've found that the results fluctuate a lot under the torchsparse library. Which sparse convolutional library would you choose?

Hi @ybc-ybc, thanks for your interest in this work!

For LiDAR segmentation on SemanticKITTI and nuScenes, it is often normal to see certain fluctuations (of evaluation scores) in-between epochs during the training.

Setting a fixed random seed will likely ensure reproducible results, but you might need to try multiple times to find the optimal seed.

For Cylinder3D, spconv is the default backend. For MinkUNet and SPVCNN, using the torchsparse library tends to yield the best possible results.

ybc-ybc commented 5 months ago

Glad to hear! Feel free to open a new issue if you have further questions.

@ldkong1205, Hi, sorry to bother you, I have two small queries :

  1. Referring to PCSeg and MMDetection , when experimenting with backbone networks (in spconv, MinkowskiEngine or torchsparse), does setting a fixed seed ensure reproduction of the training results?
  2. I've found that the results fluctuate a lot under the torchsparse library. Which sparse convolutional library would you choose?

Hi @ybc-ybc, thanks for your interest in this work!

For LiDAR segmentation on SemanticKITTI and nuScenes, it is often normal to see certain fluctuations (of evaluation scores) in-between epochs during the training.

Setting a fixed random seed will likely ensure reproducible results, but you might need to try multiple times to find the optimal seed.

For Cylinder3D, spconv is the default backend. For MinkUNet and SPVCNN, using the torchsparse library tends to yield the best possible results.

Hi, @ldkong1205, thanks for sharing! I mean the stable best result is beneficial for us to know if the idea is brought about or caused by randomness, but none of these sparse convolution libraries seem to ensure reproduction.

ldkong1205 commented 5 months ago

Hi @ybc-ybc, you might want to try TorchSparse++, the newest sparse convolution backend.

As mentioned by the authors: "TorchSparse++ achieves 2.9x, 3.3x, 2.2x and 1.7x measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is 1.2-1.3x faster than SpConv v2 in mixed precision training."

We are currently integrating this library into the MMDetection3D codebase, but it may require some time to finish the experiments. I will keep you updated when we get something about it.

ybc-ybc commented 5 months ago

@ldkong1205, thanks.

I also tested MinkUnet with torchsparse v2.1 on PCSeg, and it really is the fastest sparse convolution backend.

There are some minor bugs, looking forward to your detailed test results!