muskie82 / MonoGS

[CVPR'24 Highlight] Gaussian Splatting SLAM
https://rmurai.co.uk/projects/GaussianSplattingSLAM/
Other
1.09k stars 95 forks source link

Torch matrix inversion error #20

Open koktavy opened 3 months ago

koktavy commented 3 months ago

I've (almost) gotten the repo working on Windows with the help of Issue 16.

When I run on the sample data (even using --eval) I hit this issue: torch._C._LinAlgError: torch.linalg.inv: The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.

python slam.py --config configs/mono/tum/fr3_office.yaml --eval

MonoGS: Running MonoGS in Evaluation Mode
MonoGS: Following config will be overriden
MonoGS:         save_results=True
MonoGS:         use_gui=False
MonoGS:         eval_rendering=True
MonoGS:         use_wandb=True
MonoGS: saving results in results\datasets_tum\2024-03-12-13-18-30
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.16.4
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
MonoGS: Resetting the system
MonoGS: Initialized map
Process Process-3:
Traceback (most recent call last):
  File "C:\Users\Tavius\miniconda3\envs\MonoGS\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "C:\Users\Tavius\miniconda3\envs\MonoGS\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "X:\Projects\_2024\MonoGS\utils\slam_backend.py", line 417, in run
    self.add_next_kf(cur_frame_idx, viewpoint, depth_map=depth_map)
  File "X:\Projects\_2024\MonoGS\utils\slam_backend.py", line 69, in add_next_kf
    viewpoint, kf_id=frame_idx, init=init, scale=scale, depthmap=depth_map
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 239, in extend_from_pcd_seq
    self.create_pcd_from_image(cam_info, init, scale=scale, depthmap=depthmap)
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 131, in create_pcd_from_image
    return self.create_pcd_from_image_and_depth(cam, rgb, depth, init)
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 150, in create_pcd_from_image_and_depth
    W2C = getWorld2View2(cam.R, cam.T).cpu().numpy()
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\utils\graphics_utils.py", line 41, in getWorld2View2
    C2W = torch.linalg.inv(Rt)
torch._C._LinAlgError: torch.linalg.inv: The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

This is a fresh install using --recursive and only incorporating the change noted above.

koktavy commented 3 months ago

Also here's the batch script I used to download the data on Windows instead of Linux:

IF NOT EXIST "datasets\tum" mkdir "datasets\tum"
cd datasets\tum
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg1/rgbd_dataset_freiburg1_desk.tgz
tar -xvzf rgbd_dataset_freiburg1_desk.tgz
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg2/rgbd_dataset_freiburg2_xyz.tgz
tar -xvzf rgbd_dataset_freiburg2_xyz.tgz
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_office_household.tgz
tar -xvzf rgbd_dataset_freiburg3_long_office_household.tgz
cd ../..

Run from the root in Powershell as scripts\download_tum.bat

zmf2022 commented 3 months ago

me too

rmurai0610 commented 3 months ago

Hi, thank you for your interest!

Can you print out the variables R, t, Rt, in getWorld2View2 so we can check if the matrix is singular?

I suspect R,t are all zeros due to this bug, but I could be wrong: https://discuss.pytorch.org/t/pytorch-multiprocessing-with-cuda-sets-tensors-to-0/179117

zmf2022 commented 3 months ago

这是来自QQ邮箱的假期自动回复邮件。您好,已收到您的邮件,将尽快给您回复!

yanyan-li commented 3 months ago

> Hi, thank you for your interest!

Can you print out the variables R, t, Rt, in getWorld2View2 so we can check if the matrix is singular?

I suspect R,t are all zeros due to this bug, but I could be wrong: https://discuss.pytorch.org/t/pytorch-multiprocessing-with-cuda-sets-tensors-to-0/179117

It is true. Sometimes, the orientation and translation are all zeros. I printed the inputs of getWorld2View2(R,t, .....) `w2v tensor([[-0.8280, 0.5254, -0.1956], [ 0.4139, 0.3374, -0.8455], [-0.3782, -0.7811, -0.4969]], device='cuda:0') tensor([-2.2574, 0.3327, 1.9227], device='cuda:0')

w2v tensor([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], device='cuda:0') tensor([0., 0., 0.], device='cuda:0')

w2v tensor([[-0.8280, 0.5254, -0.1956], [ 0.4139, 0.3374, -0.8455], [-0.3782, -0.7811, -0.4969]], device='cuda:0') tensor([-2.2574, 0.3327, 1.9227], device='cuda:0')

w2v tensor([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], device='cuda:0') tensor([0., 0., 0.], device='cuda:0')`

muskie82 commented 3 months ago

As a quick look, some people reported the same issue in pytorch repo, but no one seems to find a solution. This problem seems to happen only in pytorch multiprocess on Windows.

Would appreciate it if you share the solution when you find it! The last resort would be to set up an Ubuntu environment on Docker and run MonoGS.

foreverlong commented 3 months ago

Have you solve this error? I meet this error too on my Win10.

zmf2022 commented 2 months ago

add this script to disable multithreads! torch.set_num_interop_threads(1)

hnglp commented 2 months ago

add this script to disable multithreads! torch.set_num_interop_threads(1)

is this the solution? can you be more specific

hnglp commented 1 month ago

add this script to disable multithreads! torch.set_num_interop_threads(1)

hi i am not sure that i understand what you say, can you tell me where should i add this line