Dataloader for the Paris lille 3D dataset

chrise96 commented 3 years ago

Prerequisites

pip install plyfile
Download the dataset “training_10_classes” from https://npm3d.fr/paris-lille-3d

Note Not all GPU's can handle the original point cloud tile size of paris_lille_3d. Therefore, I used pdal to split the tiles. The command for one tile is: pdal split --capacity 1000000 Lille1_2.ply output/Lille1_2.ply. The smaller pdal output tiles of Lille1_1.ply and Paris.ply are put in the train folder. The pdal outputs of Lille1_2.ply are put in the validation folder.

Todo

Is the data augmentation part necessary for this dataset?
When training MinkuNet or SPVCNN on this dataset it seems that they are not able to learn small objects, also questioned in this issue (https://github.com/mit-han-lab/spvnas/issues/40). I trained for 20 epochs, some small classes are learned after around 15 epochs. Could this be due to the small amount of train samples or do we need to reconfigure the train settings for this dataset?

zhijian-liu commented 3 years ago

Hi @chrise96, thanks for the contribution! I'm wondering if you could share the performance of MinkUNet and SPVCNN with us. This would help us better understand whether data augmentation or further investigation on small objects is necessary. Thanks!

chrise96 commented 3 years ago

GPU used: Nvidia Tesla v100

Config file _configs/paris_lille3d/minkunet/cr0p64.yaml

Epochs 20
Workers 8
Batch size 2

Result

[iou/test] = 38.393
[iou/test/max] = 38.393
[loss] = 0.20219 Epoch finished in 10 minutes 35 seconds. 20 epochs of training finished in 3 hours 32 minutes 13 seconds.

	ground	building	pole-road_sign-traffic_light	bollard-small_pole	trash_can	barrier	pedestrian	car	natural-vegetation
total_seen	12042099.0	7193259.0	109906.0	7298.0	115885.0	54818.0	11233.0	734084.0	917631.0
total_correct	11981472.0	6841477.0	0.0	0.0	0.0	22926.0	0.0	734084.0	712023.0
total_positive	12119651.0	6936499.0	0.0	0.0	0.0	289726.0	0.0	734084.0	965661.0

Config file _configs/paris_lille3d/spvcnn/cr0p64.yaml

Epochs 20
Workers 8
Batch size 1 (CUDA out of memory errors so did not use 2)

Result

[iou/test] = 35.446
[iou/test/max] = 36.483
[loss] = 0.14018 Epoch finished in 9 minutes 41 seconds. 20 epochs of training finished in 3 hours 24 minutes 2 seconds.

	ground	building	pole-road_sign-traffic_light	bollard-small_pole	trash_can	barrier	pedestrian	car	natural-vegetation
total_seen	12042099.0	7193259.0	109906.0	7298.0	115885.0	54818.0	11233.0	734084.0	917631.0
total_correct	11952613.0	6354523.0	0.0	0.0	0.0	29705.0	0.0	720214.0	694746.0
total_positive	12085977.0	6423716.0	0.0	0.0	0.0	738765.0	0.0	897529.0	1076593.0

zhijian-liu commented 3 years ago

Thanks for the results! I'm wondering if you could also train MinkUNet with a batch size of 1. I would like to see an apples-to-apples comparison between MinkUNet and SPVCNN.

chrise96 commented 3 years ago

GPU used: Nvidia Tesla v100

Config file configs/paris_lille_3d/minkunet/cr0p64.yaml

Epochs 20
Workers 8
Batch size 1

Result

[iou/test] = 36.01
[iou/test/max] = 36.757
[loss] = 0.18344 Epoch finished in 10 minutes 7 seconds. 20 epochs of training finished in 3 hours 21 minutes 51 seconds.

	ground	building	pole-road_sign-traffic_light	bollard-small_pole	trash_can	barrier	pedestrian	car	natural-vegetation
total_seen	12042099.0	7193259.0	109906.0	7298.0	115885.0	54818.0	11233.0	734084.0	917631.0
total_correct	11909182.0	6663769.0	9.0	0.0	0.0	25349.0	0.0	747316.0	675152.0
total_positive	12004485.0	6776544.0	218.0	0.0	0.0	506831.0	0.0	1018298.0	916204.0

zhijian-liu commented 3 years ago

Thanks! It seems that SPVCNN is worse than MinkUNet on this dataset, which is different from our observation on other datasets. I suggest that you tune the hyperparameters (e.g., learning rate, weight decay, voxel size) further to see whether the performance can be improved.

chrise96 commented 3 years ago

Did some runs with different hyperparameters. The miou results listed on https://npm3d.fr/paris-lille-3d are measured on the test set (not publicly available), I use the val dataset.

All results below are from the minkunet model.

Test 1

batch_size: 1
workers: 4
voxel_size: 0.03
lr: default
weight_decay: 1.0e-3

Result

[iou/test] = 35.374
[iou/test/max] = 36.426
[loss] = 0.20783 Epoch finished in 10 minutes 42 seconds. 35 epochs of training finished in 6 hours 11 minutes 49 seconds.

	ground	building	pole-road_sign-traffic_light	bollard-small_pole	trash_can	barrier	pedestrian	car	natural-vegetation
total_correct	11965248.0	5656188.0	0.0	0.0	0.0	25072.0	0.0	737846.0	614532.0
total_seen	12042099.0	7193259.0	109906.0	7298.0	115885.0	54818.0	11233.0	734084.0	917631.0
total_positive	12122087.0	5828340.0	0.0	0.0	0.0	1678808.0	0.0	858198.0	735147.0

Test 2

batch_size: 1
workers: 4
voxel_size: 0.03
lr: 0.15
weight_decay: 1.0e-3

Result

[iou/test] = 39.345
[iou/test/max] = 41.484
[loss] = 0.067995 Epoch finished in 10 minutes 44 seconds. 35 epochs of training finished in 6 hours 7 minutes 37 seconds.

	ground	building	pole-road_sign-traffic_light	bollard-small_pole	trash_can	barrier	pedestrian	car	natural-vegetation
total_correct	11982199.0	6551127.0	33359.0	0.0	0.0	28680.0	0.0	757937.0	659963.0
total_seen	12042099.0	7193259.0	109906.0	7298.0	115885.0	54818.0	11233.0	734084.0	917631.0
total_positive	12088832.0	6678264.0	85057.0	0.0	0.0	592066.0	0.0	911864.0	866497.0

Test 3

workers 4
batch size 1
voxel_size: 0.03
lr: 0.1
weight_decay: default

Result

[iou/test] = 39.458
[iou/test/max] = 40.645
[loss] = 0.045596 Epoch finished in 11 minutes 37 seconds. 35 epochs of training finished in 6 hours 34 minutes 5 seconds.

	ground	building	pole-road_sign-traffic_light	bollard-small_pole	trash_can	barrier	pedestrian	car	natural-vegetation
total_correct	11924425.0	5962722.0	42254.0	0.0	11.0	38827.0	6.0	711293.0	635028.0
total_seen	12042099.0	7193259.0	109906.0	7298.0	115885.0	54818.0	11233.0	734084.0	917631.0
total_positive	12031033.0	6076242.0	98039.0	0.0	52.0	1517622.0	97.0	806125.0	693388.0

Test 4

workers: default
batch size: default
voxel size: default
lr: 0.001
weight decay: default

Result

[iou/test] = 35.305
[iou/test/max] = 35.305
[loss] = 0.11198

No table

zhijian-liu commented 3 years ago

Could you also do the same for SPVCNN? Thanks!

chrise96 commented 3 years ago

What do you think about the following questions and possible future work:

Is the tile size too small to properly learn?
Do we need to create a tile structure where the tiles partly overlap?
I'm not able to train the full model on the V100 16GB with a voxel size of 0.5...

Test SPVCNN with Nvidia Tesla V100

batch_size: 1
workers: 4
voxel_size: 0.03
lr: 0.15
weight_decay: 1.0e-3

Result Unfortunately, got a CUDA out of memory error with the above settings...

[iou/test] = 36.94
[iou/test/max] = 40.717
[loss] = 0.36434 Estimated time left: 2 hours 49 minutes 11 seconds. Epoch finished in 11 minutes 6 seconds. Epoch 21/35 started

	ground	building	pole-road_sign-traffic_light	bollard-small_pole	trash_can	barrier	pedestrian	car	natural-vegetation
total_seen	12042099.0	7193259.0	109906.0	7298.0	115885.0	54818.0	11233.0	734084.0	917631.0
total_correct	11977888.0	4392665.0	36969.0	0.0	0.0	39113.0	150.0	733574.0	594135.0
total_positive	12139342.0	4554124.0	49463.0	0.0	0.0	2964529.0	249.0	892533.0	622340.0

zhijian-liu commented 3 years ago

I think if the scale of the scene is very large, a possible solution if to use sliding windows to split the scenes into smaller portions (as most S3DIS papers do).

mit-han-lab / spvnas

Dataloader for the Paris lille 3D dataset #53