meidachen / STPLS3D

🔥 Synthetic and real-world 2d/3d dataset for semantic and instance segmentation (BMVC 2022 Oral)
236 stars 20 forks source link

Data augmentation for instance segmentation #13

Closed ywyue closed 2 years ago

ywyue commented 2 years ago

Hi Thanks for your great work! I am checking the data preparation script for instance segmenation: prepare_data_inst_instance_stpls3d.py. In the code, it seems that you only apply data augmentation to these classes [0, 2, 3, 7, 8, 9, 12, 13]. Is there any motivation behind that? Besides, why do we even apply random rotation augmentation to class 0 (i.e. ground)? Thanks in advance!

meidachen commented 2 years ago

Thanks for your interest in our dataset. The main reason for doing the data augmentation is to increase the number of points for the underrepresented semantic classes and to improve the semantic segmentation performance. The selected classes were having a low semantic segmentation performance during our experiments. The ground was included because all objects are connected to it, and the semantic segmentation needs to split it with the other objects.

ywyue commented 2 years ago

Thanks for your reply. I understand the purpose of the data augmentation. The weighted loss also alleviates the data unbalance problem. However, I am confused about such data augmentation implementation. In my understanding, you apply data augmentation to the whole point cloud, and then only keep points belonging to these classes [0, 2, 3, 7, 8, 9, 12, 13] and filtered out all other points. However, such augmented training samples represent 'incomplete scenes', right? I think a more reasonable way would be to apply the data augmentation to these classes [0, 2, 3, 7, 8, 9, 12, 13] and keep other points unchanged instead of filtering out them.

meidachen commented 2 years ago

Yes, I totally agree with you that the augmented training samples represent 'incomplete scenes', and this may cause problems. This was my concern as well before, but the experiment showed "reasonable" results from this augmentation (increased performance in these underrepresented classes). One of the purposes is to increase the #of points / % of points for these underrepresented classes and adding other points back may still make these classes underrepresented.

In addition, if you check the classes that were removed, you can see that buildings (1) and high vegetation (4) have too many points, and adding them back will make the training process take too long.

Another interesting thing about this is, by using the augmented 'incomplete scenes', it basically removed some of the contextual relationships between objects (if there were any). And does contextual relationships between objects actually maters is still a good research topic I think. From the recent work of "Mix3D: Out-of-Context Data Augmentation for 3D Scenes", it seems like Context may not matter that much in the 3D scene segmentation, especially in the outdoor environments with a limited sample size that can feed to a network. This is just my opinion and I could be totally wrong since this was only tested with HAIS and PointGroup which are voxelization-based networks. I'm not sure if it will work for other types of networks such as KpConv or RandLA (point-based) since they behave differently (though Mix3D worked with KpConv).

ywyue commented 2 years ago

Interesting and insightful explanation! Thanks :)