Closed qinzheng93 closed 2 years ago
Hi @qinzheng93, you are correct in the observation. I used a smaller augmentation because the network considers the point positions (through the positional encoding in the attention layers). The network can use this information to infer certain properties which are not possible in local-only descriptors, e.g. a region might be empty because it is occluded by points nearer to the camera.
I recently performed some tests to understand the behavior against larger transformation perturbations of our trained model. On the 3DMatch test set, compared to e.g. predator, our model performs better for pairs with smaller transformation differences, but worse for pairs with large transformation differences. This may be a result of the weaker augmentation, but also may be due to training data bias (there's more training pairs with smaller transformations).
@yewzijian Thanks for your patient replying. That’s interesting. How does RegTR perform under the standard augmentation?
Hi @qinzheng93,
On ModelNet we do use the standard (larger) augmentations and obtain great results, but this is also because the train/test conditions are similar. For 3DMatch, I do not have a definite answer since I have not tried it, but I suspect it might perform worse.
@yewzijian Thanks again. And look forward to your future work!
Thanks for the great work. I notice that RegTR adopts a much weaker augmentation than the commonly used augmentation in [1, 2, 3]. How does this affect the convergence of RegTR? And will the weak augmentation affect the robustness to large transformation perturbation? Thank you.
[1] Bai, X., Luo, Z., Zhou, L., Fu, H., Quan, L., & Tai, C. L. (2020). D3feat: Joint learning of dense detection and description of 3d local features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6359-6367). [2] Huang, S., Gojcic, Z., Usvyatsov, M., Wieser, A., & Schindler, K. (2021). Predator: Registration of 3d point clouds with low overlap. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (pp. 4267-4276). [3] Yu, H., Li, F., Saleh, M., Busam, B., & Ilic, S. (2021). Cofinet: Reliable coarse-to-fine correspondences for robust pointcloud registration. Advances in Neural Information Processing Systems, 34, 23872-23884.