spla-tam / SplaTAM

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)
https://spla-tam.github.io/
BSD 3-Clause "New" or "Revised" License
1.59k stars 174 forks source link

Depth-Anything_v2 Depth Rendering #127

Open czeyveli1 opened 4 months ago

czeyveli1 commented 4 months ago

Hello everyone; I am trying to denoising process on the TUM-RGBD depth dataset. When I get the new depth dataset and change instead of original depth dataset, I get this error. Mapping Time Step: 32: 100%|████████████████████████████████████████████| 30/30 [00:04<00:00, 6.30it/s] Tracking Time Step: 33: 100%|█████████████████████████████████████████| 200/200 [00:21<00:00, 9.39it/s] Tracking Time Step: 33: 100%|█████████████████████████████████████████| 200/200 [00:21<00:00, 9.41it/s] Selected Keyframes at Frame 33: [19, 4, 0, 9, 24, 14, 29, 33] 6%|███▌ | 33/592 [07:58<2:14:58, 14.49s/it] Traceback (most recent call last):████████████████████████▊ | 21/30 [00:03<00:01, 5.17it/s] File "/home/cz/Documents/SplaTAM/scripts/splatam.py", line 1014, in <module> rgbd_slam(experiment.config) File "/home/cz/Documents/SplaTAM/scripts/splatam.py", line 847, in rgbd_slam loss, variables, losses = get_loss(params, iter_data, variables, iter_time_idx, config['mapping']['loss_weights'], File "/home/cz/Documents/SplaTAM/scripts/splatam.py", line 253, in get_loss depth_sil, _, _, = Renderer(raster_settings=curr_data['cam'])(**depth_sil_rendervar) File "/home/cz/anaconda3/envs/splatam/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/cz/anaconda3/envs/splatam/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 186, in forward return rasterize_gaussians( File "/home/cz/anaconda3/envs/splatam/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 28, in rasterize_gaussians return _RasterizeGaussians.apply( File "/home/cz/anaconda3/envs/splatam/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 79, in forward num_rendered, color, radii, geomBuffer, binningBuffer, imgBuffer, depth = _C.rasterize_gaussians(*args) RuntimeError: CUDA out of memory. Tried to allocate 616.00 MiB (GPU 0; 7.75 GiB total capacity; 4.95 GiB already allocated; 441.75 MiB free; 5.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF I have used the median filter from OpenCV and skimage library to get rid of some noise (visual artifacts). For instance, the original image size of the one frame is 118 kB, and my output data was 11.2 kB. So, this is not a large dataset. I have also closed the COMPRESSION of png format using cv2.imwrite(output_file_path, median_using_skimage, [cv2.IMWRITE_PNG_COMPRESSION, 0]) command but I get the same error.

Could you help me with how can I solve this problem?

Nik-V9 commented 2 months ago

Hi, the Gaussian Splat could be blowing up due to a loss of tracking.

Since you mentioned that you are using custom depth images, if the depth images are not scale-consistent, SplaTAM will not work.

czeyveli1 commented 2 months ago

Hello @Nik-V9, thanks for your interest. I solved the problem, but I now have a new challenge.

I am trying to render a new depth dataset using the depth-anything_v2 algorithm. In the original TUM-RGBD dataset, there are 595 depth images, but they do not match any of the images in the original RGB dataset. My depth dataset contains 613 images (I generated a depth image from each RGB image), and I can render it, but the result is very poor. I would like to share my depth images and the results.

Original dataset;                                My Dataset Results;
Average PSNR: 21.00                              Average PSNR: 15.21
Average ATE RMSE: 3.75 cm                        Final Average ATE RMSE: 39.98 cm
Average Depth L1: 4.50 cm                        Average Depth L1: 18.46 cm
Average MS-SSIM: 0.816                           Average MS-SSIM: 0.628
Average LPIPS: 0.284                             Average LPIPS: 0.554
Mapping/PSNR 12.41468                            Mapping/PSNR 35.36498

Below is an example of the depth images I created using the depth_anything algorithm. (I normalized it between 0 and 10128.) 1305031452 791720

This is the result from the original TUM-Dataset: Screenshot from 2024-09-16 18-45-32

And here is the result from my depth dataset: Screenshot from 2024-09-16 18-46-23

Could you help me understand how to solve this problem?

Santoi commented 1 month ago

if the depth images are not scale-consistent, SplaTAM will not work.

Hi, @Nik-V9! What do you mean by scale-consistent?

Nik-V9 commented 1 month ago

Hi Santoi, monocular depth estimation is up to scale, i.e., in each instance of the model's prediction, there is no guarantee that 1 unit will be equal to a fixed distance scale. Hence, when you use multiple monocular depth maps together, you also need to optimize for a scale factor.

https://github.com/MichaelGrupp/evo/wiki/Metrics#alignment

You will probably get a better SplaTAM result using Metric3Dv2 with known intrinsic (since the prediction will always try to be in metric scale).

czeyveli1 commented 1 month ago

Hi @Nik-V9 , as you mentioned, I have also been trying to process with Metric3Dv2 for the past two days. I will let you know if the results are promising. Thank you so much for your interest.