octree-nn / octformer

OctFormer: Octree-based Transformers for 3D Point Clouds
MIT License
259 stars 18 forks source link

Why hasn’t OctFormer been used with outdoor datasets yet? #34

Closed leunark closed 4 days ago

leunark commented 6 days ago

I’m currently investigating Octree-based architectures and curious if there is a specific reason why OctFormer hasn’t been applied to outdoor datasets yet. Given its fitting capabilities for hierarchical representations and efficiency with point cloud processing, it seems like it could perform well in outdoor environments, such as autonomous driving datasets or large-scale mapping tasks. Comparable models like PointTransformerV3 have been used for both indoor and outdoor datasets.

Have there been challenges or considerations that make its application to outdoor datasets less practical or effective?

wang-ps commented 6 days ago

Certainly, OctFormer can be applied to outdoor datasets. The only reason it hasn't been tested on them is that I currently lack the bandwidth to conduct those experiments.

The primary difference between PointTransformerV3 and OctFormer lies in their handling of point ordering. PointTransformerV3 employs four types of orderings (Z-order, Transpose Z-order, Hilbert order, and Transpose Hilbert order), whereas OctFormer uses only Z-order. According to the ablation study in PointTransformerV3, these additional orderings can improve the mIoU on ScanNet by approximately 1.5 points under the same training parameters. However, you can achieve comparable performance with OctFormer by scaling it up to compensate for the absence of multiple orderings. So you can also train OctFormer on outdoor datasets to get comparable performances.

leunark commented 4 days ago

That's what I assumed. The difference in handling point ordering is interesting. I can see how different attention serialization strategies might improve the model's performance. I’ll start by exploring ways to improve OctFormer first on the ScanNet dataset, possibly incorporating multiple ordering strategies to see if it enhances performance. Then, it would really be interesting to see if it can achieve good performance on outdoor datasets comparable to PTv3. Thanks for your insights!