princeton-vl / DROID-SLAM

BSD 3-Clause "New" or "Revised" License
1.65k stars 272 forks source link

Error in inference - not enough values to unpack (expected 2, got 0) #115

Open FlorinM25 opened 7 months ago

FlorinM25 commented 7 months ago

Hello, Firstly, thank you very much for this amazing project!

When I want to run some demos with the commands presented in the README file I always get this error: ii, jj = torch.as_tensor(es, device=self.device).unbind(dim=-1)

The terminal looks like this:

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
220it [00:19, 11.21it/s]
  File "envsvenv\vis\DROID-SLAM\droid_slam\", line 96, in terminate
    self.backend(7) # Run the backend process with argument 7
  File "envsvenv\vis\droidvenvvis\lib\site-packages\torch\utils\", line 115, in decorate_context
    return func(*args, **kwargs)
  File "droid_slam\", line 66, in __call__
  File "droid_slam\", line 437, in add_proximity_factors
    ii, jj = torch.as_tensor(es, device=self.device).unbind(dim=-1)
ValueError: not enough values to unpack (expected 2, got 0)
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

When the demo is running, when the images are iterated, the Open3d window opens but nothing appears on it.

After some debugging in the file, I noticed that tensors ii and jj are [0] for all the running process, as well as the es array which is always empty.

I tried to use the --reconstruction_path flag to save the recon files. I get disps.npy, images.npy, intrinsics.npy, poses.npy, tstamps.npy. The .npy files have some values in them, but I doubt the fact that they are correct because the disps.npy file looks like this:

[[[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]]

I also tried to disable visualization as said in issue #76 with --disable_vis flag but the process just stops after some iterations:

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
10it [01:52,  3.51s/it]
Process finished with exit code -1073741819 (0xC0000005)

In issue #13 a datapath is mentioned, but I am not sure what it refers to..

I am working on Windows in a virtualenv in which I installed PyTorch 2.1.1 with cuda11.8 (I tried with torch 1.10 and cuda11.3 but the same error occurred). The GPU I tested on was a 3080TI with 12gb VRAM.

I assume this is a CUDA related issue, but I am sure in what way.

I hope someone can help me fix my errors. Thank you!

Sebastian-Garcia commented 6 months ago

Were you able to solve this? I too am using a 3080Ti and am facing this same issue when running with CUDA 11.3

FlorinM25 commented 6 months ago

Hello, I wasn't able to solve this on Windows, but I managed to make it work on Linux Ubuntu 22.04 (I don't think DROID-SLAM can work on Windows). I installed CUDA 12.2 from the NVIDIA website and used PyTorch with CUDA 12.1 with pip3 install from the official PyTorch website. For the environment I used the virtualenv package from pip instead of conda. The version of python that I used is 3.8. I installed the rest of the packages in the environment with pip commands. Additionally, I installed the ninja package with pip install ninja because python install will be faster. I hope this will help you!

robofar commented 3 months ago

@FlorinM25 I tried with pytorch=2.1.1 and cuda=12.1 and python=3.8 but I got libcudart error @Sebastian-Garcia I also tried with pytorch=1.10.1 and cuda=11.3 and python=3.9 but I am getting this unpack error. Did you figure it out in the end? What pytorch, cuda and python versions you use?

andrewnc commented 2 months ago

I'm getting the same unpack error because the distance comparison comes back as close to zero and gets set to inf which is then skipped