spla-tam / SplaTAM

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)
https://spla-tam.github.io/
BSD 3-Clause "New" or "Revised" License
1.53k stars 174 forks source link

Debugging Failure on Custom 3DScanner App Data #39

Open wangqinglhc0122 opened 10 months ago

wangqinglhc0122 commented 10 months ago

Thank you so much to the authors for your wonderful work on SplaTAM and to @ironjr for the script to extract the point cloud. I just made a run on our data and had an issue.

I used the data collected by 3DScanner App using iPhone 12 pro which includes the RGB images, depth images and transforms.json. The resulting point cloud is like spiral and each frame is seperated from each other. Capture

libaiwanovo commented 10 months ago

I have also encountered such situations.The camera poses provided may be too sparse or there might be significant time gaps between each frame, which can result in this situation.

Nik-V9 commented 10 months ago

Hi, Thanks for trying out our code!

This seems to be a tracking failure. Some potentially helpful debugging steps: https://github.com/spla-tam/SplaTAM/issues/14#issuecomment-1847467575

Ideally, if this is a frame rate issue, as suggested by @liaiwanovo, increasing the number of tracking iterations can help resolve this issue. We aim to have an adaptive iteration scheme to prevent this problem in the future.

Also, an additional debugging step I would suggest is to use the transforms.json by setting the following to True: https://github.com/spla-tam/SplaTAM/blob/bbaf5cc5754bf1034b33902007872c694e412a31/configs/iphone/splatam.py#L62 This can help you verify that all the data is in the correct format.

Xiaohao-Xu commented 7 months ago

Hi @libaiwanovo @wangqinglhc0122 @wangqinglhc0122 , I believe the issue at hand is related to the adaptive (and explicit) Gaussian kernel expansion mechanism. In my recent investigation on the robustness of current SLAM models (https://github.com/Xiaohao-Xu/SLAM-under-Perturbation), I have found that as the complexity of the scene increases (for example, with more perturbations and objects), it becomes necessary to add more Gaussian kernels to SplaTAM. This ensures a higher quality reconstruction due to its explicit modeling of the scene. Although SplaTAM performs well on standard SLAM datasets with SoTA performance, there still appears to be a gap that needs to be addressed when it comes to real-world videos/applications, which deserves further exploration.