spla-tam / SplaTAM

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)
https://spla-tam.github.io/
BSD 3-Clause "New" or "Revised" License
1.58k stars 174 forks source link

Question regarding the shape of the mask created from the `depth`. #126

Closed Santoi closed 1 month ago

Santoi commented 4 months ago

When running the scripts/splatam.py algorithm on a NerfCapture created dataset, I ran into 2 different issues. The first one is just context in case it has something to do with the second one.

First, the depth and color datasets had different dimensions, but were applied the same permute transformation. So I decided on flattening it:

old_depth.Size([720, 960, 3, 1])
new_depth = torch.flatten(old_depth, start_dim=2)
new_depth.Size([720, 960, 3])

But now I'm running into an issue with the mask generated by the depth dataset, where it doesn't match the shape of the indexed tensor. I've played around with its shape, but I'm not sure I understand what does the point_cld represent, to create an appropriate mask. The error is dumped below. The point_cld tensor is created by concatenating two different 2 dimensional tensors, so it would make sense it is 2 dimensional as well.

Any help on how to move forward is appreciated!

Traceback (most recent call last):
  File "scripts/splatam.py", line 1033, in <module>
    rgbd_slam(experiment.config)
  File "scripts/splatam.py", line 572, in rgbd_slam
    densify_intrinsics, densify_cam = initialize_first_timestep(dataset, num_frames,
  File "scripts/splatam.py", line 215, in initialize_first_timestep
    init_pt_cld, mean3_sq_dist = get_pointcloud(color, depth, densify_intrinsics, w2c, 
  File "scripts/splatam.py", line 117, in get_pointcloud
    point_cld = point_cld[mask]
IndexError: The shape of the mask [518400] at index 0 does not match the shape of the indexed tensor [172800, 6] at index 0
Tiandishihua commented 3 months ago

When running the scripts/splatam.py algorithm on a NerfCapture created dataset, I ran into 2 different issues. The first one is just context in case it has something to do with the second one.

First, the depth and color datasets had different dimensions, but were applied the same permute transformation. So I decided on flattening it:

old_depth.Size([720, 960, 3, 1])
new_depth = torch.flatten(old_depth, start_dim=2)
new_depth.Size([720, 960, 3])

But now I'm running into an issue with the mask generated by the depth dataset, where it doesn't match the shape of the indexed tensor. I've played around with its shape, but I'm not sure I understand what does the point_cld represent, to create an appropriate mask. The error is dumped below. The point_cld tensor is created by concatenating two different 2 dimensional tensors, so it would make sense it is 2 dimensional as well.

Any help on how to move forward is appreciated!

Traceback (most recent call last):
  File "scripts/splatam.py", line 1033, in <module>
    rgbd_slam(experiment.config)
  File "scripts/splatam.py", line 572, in rgbd_slam
    densify_intrinsics, densify_cam = initialize_first_timestep(dataset, num_frames,
  File "scripts/splatam.py", line 215, in initialize_first_timestep
    init_pt_cld, mean3_sq_dist = get_pointcloud(color, depth, densify_intrinsics, w2c, 
  File "scripts/splatam.py", line 117, in get_pointcloud
    point_cld = point_cld[mask]
IndexError: The shape of the mask [518400] at index 0 does not match the shape of the indexed tensor [172800, 6] at index 0

Hello! Have you solved it?

Santoi commented 3 months ago

Yes. The main issue was that the mask was being generated from a 3-channel depth, while the point cloud is created from only 1 of those channels.

Is it expected for the depth to have 3 channels instead of 1?

Nik-V9 commented 2 months ago

Hi, this could be an issue with how the data was captured using NeRFCapture. For more info, please see this issue: https://github.com/spla-tam/SplaTAM/issues/59

Santoi commented 1 month ago

Effectively, this was an issue caused by the offline-mode capture from NeRFCapture. Capturing with another tool and converting the dataset to the NeRFCapture ouput schema solved this.