mit-han-lab / spvnas

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
http://spvnas.mit.edu/
MIT License
587 stars 109 forks source link

Dataloader for the Paris lille 3D dataset #53

Closed chrise96 closed 4 months ago

chrise96 commented 3 years ago

Prerequisites

Note Not all GPU's can handle the original point cloud tile size of paris_lille_3d. Therefore, I used pdal to split the tiles. The command for one tile is: pdal split --capacity 1000000 Lille1_2.ply output/Lille1_2.ply. The smaller pdal output tiles of Lille1_1.ply and Paris.ply are put in the train folder. The pdal outputs of Lille1_2.ply are put in the validation folder.

Todo

zhijian-liu commented 3 years ago

Hi @chrise96, thanks for the contribution! I'm wondering if you could share the performance of MinkUNet and SPVCNN with us. This would help us better understand whether data augmentation or further investigation on small objects is necessary. Thanks!

chrise96 commented 3 years ago

GPU used: Nvidia Tesla v100

Config file _configs/paris_lille3d/minkunet/cr0p64.yaml

Result

ground building pole-road_sign-traffic_light bollard-small_pole trash_can barrier pedestrian car natural-vegetation
total_seen 12042099.0 7193259.0 109906.0 7298.0 115885.0 54818.0 11233.0 734084.0 917631.0
total_correct 11981472.0 6841477.0 0.0 0.0 0.0 22926.0 0.0 734084.0 712023.0
total_positive 12119651.0 6936499.0 0.0 0.0 0.0 289726.0 0.0 734084.0 965661.0

Config file _configs/paris_lille3d/spvcnn/cr0p64.yaml

Result

ground building pole-road_sign-traffic_light bollard-small_pole trash_can barrier pedestrian car natural-vegetation
total_seen 12042099.0 7193259.0 109906.0 7298.0 115885.0 54818.0 11233.0 734084.0 917631.0
total_correct 11952613.0 6354523.0 0.0 0.0 0.0 29705.0 0.0 720214.0 694746.0
total_positive 12085977.0 6423716.0 0.0 0.0 0.0 738765.0 0.0 897529.0 1076593.0
zhijian-liu commented 3 years ago

Thanks for the results! I'm wondering if you could also train MinkUNet with a batch size of 1. I would like to see an apples-to-apples comparison between MinkUNet and SPVCNN.

chrise96 commented 3 years ago

GPU used: Nvidia Tesla v100

Config file configs/paris_lille_3d/minkunet/cr0p64.yaml

Result

ground building pole-road_sign-traffic_light bollard-small_pole trash_can barrier pedestrian car natural-vegetation
total_seen 12042099.0 7193259.0 109906.0 7298.0 115885.0 54818.0 11233.0 734084.0 917631.0
total_correct 11909182.0 6663769.0 9.0 0.0 0.0 25349.0 0.0 747316.0 675152.0
total_positive 12004485.0 6776544.0 218.0 0.0 0.0 506831.0 0.0 1018298.0 916204.0
zhijian-liu commented 3 years ago

Thanks! It seems that SPVCNN is worse than MinkUNet on this dataset, which is different from our observation on other datasets. I suggest that you tune the hyperparameters (e.g., learning rate, weight decay, voxel size) further to see whether the performance can be improved.

chrise96 commented 3 years ago

Did some runs with different hyperparameters. The miou results listed on https://npm3d.fr/paris-lille-3d are measured on the test set (not publicly available), I use the val dataset.

All results below are from the minkunet model.

Test 1

Result

ground building pole-road_sign-traffic_light bollard-small_pole trash_can barrier pedestrian car natural-vegetation
total_correct 11965248.0 5656188.0 0.0 0.0 0.0 25072.0 0.0 737846.0 614532.0
total_seen 12042099.0 7193259.0 109906.0 7298.0 115885.0 54818.0 11233.0 734084.0 917631.0
total_positive 12122087.0 5828340.0 0.0 0.0 0.0 1678808.0 0.0 858198.0 735147.0

Test 2

Result

ground building pole-road_sign-traffic_light bollard-small_pole trash_can barrier pedestrian car natural-vegetation
total_correct 11982199.0 6551127.0 33359.0 0.0 0.0 28680.0 0.0 757937.0 659963.0
total_seen 12042099.0 7193259.0 109906.0 7298.0 115885.0 54818.0 11233.0 734084.0 917631.0
total_positive 12088832.0 6678264.0 85057.0 0.0 0.0 592066.0 0.0 911864.0 866497.0

Test 3

Result

ground building pole-road_sign-traffic_light bollard-small_pole trash_can barrier pedestrian car natural-vegetation
total_correct 11924425.0 5962722.0 42254.0 0.0 11.0 38827.0 6.0 711293.0 635028.0
total_seen 12042099.0 7193259.0 109906.0 7298.0 115885.0 54818.0 11233.0 734084.0 917631.0
total_positive 12031033.0 6076242.0 98039.0 0.0 52.0 1517622.0 97.0 806125.0 693388.0

Test 4

Result

No table

zhijian-liu commented 3 years ago

Could you also do the same for SPVCNN? Thanks!

chrise96 commented 3 years ago

What do you think about the following questions and possible future work:

Test SPVCNN with Nvidia Tesla V100

Result Unfortunately, got a CUDA out of memory error with the above settings...

ground building pole-road_sign-traffic_light bollard-small_pole trash_can barrier pedestrian car natural-vegetation
total_seen 12042099.0 7193259.0 109906.0 7298.0 115885.0 54818.0 11233.0 734084.0 917631.0
total_correct 11977888.0 4392665.0 36969.0 0.0 0.0 39113.0 150.0 733574.0 594135.0
total_positive 12139342.0 4554124.0 49463.0 0.0 0.0 2964529.0 249.0 892533.0 622340.0
zhijian-liu commented 3 years ago

I think if the scale of the scene is very large, a possible solution if to use sliding windows to split the scenes into smaller portions (as most S3DIS papers do).