princeton-vl / DROID-SLAM

BSD 3-Clause "New" or "Revised" License
1.82k stars 302 forks source link

Error in inference - not enough values to unpack (expected 2, got 0) #115

Open FlorinM25 opened 1 year ago

FlorinM25 commented 1 year ago

Hello, Firstly, thank you very much for this amazing project!

When I want to run some demos with the commands presented in the README file I always get this error: ii, jj = torch.as_tensor(es, device=self.device).unbind(dim=-1)

The terminal looks like this:

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
220it [00:19, 11.21it/s]
################################
  File "envsvenv\vis\DROID-SLAM\droid_slam\droid.py", line 96, in terminate
    self.backend(7) # Run the backend process with argument 7
  File "envsvenv\vis\droidvenvvis\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "droid_slam\droid_backend.py", line 66, in __call__
    graph.add_proximity_factors(rad=self.backend_radius,
  File "droid_slam\factor_graph.py", line 437, in add_proximity_factors
    ii, jj = torch.as_tensor(es, device=self.device).unbind(dim=-1)
ValueError: not enough values to unpack (expected 2, got 0)
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

When the demo is running, when the images are iterated, the Open3d window opens but nothing appears on it.

After some debugging in the factor_graph.py file, I noticed that tensors ii and jj are [0] for all the running process, as well as the es array which is always empty.

I tried to use the --reconstruction_path flag to save the recon files. I get disps.npy, images.npy, intrinsics.npy, poses.npy, tstamps.npy. The .npy files have some values in them, but I doubt the fact that they are correct because the disps.npy file looks like this:

[[[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]]

I also tried to disable visualization as said in issue #76 with --disable_vis flag but the process just stops after some iterations:

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
10it [01:52,  3.51s/it]
Process finished with exit code -1073741819 (0xC0000005)

In issue #13 a datapath is mentioned, but I am not sure what it refers to..

I am working on Windows in a virtualenv in which I installed PyTorch 2.1.1 with cuda11.8 (I tried with torch 1.10 and cuda11.3 but the same error occurred). The GPU I tested on was a 3080TI with 12gb VRAM.

I assume this is a CUDA related issue, but I am sure in what way.

I hope someone can help me fix my errors. Thank you!

Sebastian-Garcia commented 11 months ago

Were you able to solve this? I too am using a 3080Ti and am facing this same issue when running with CUDA 11.3

FlorinM25 commented 11 months ago

Hello, I wasn't able to solve this on Windows, but I managed to make it work on Linux Ubuntu 22.04 (I don't think DROID-SLAM can work on Windows). I installed CUDA 12.2 from the NVIDIA website and used PyTorch with CUDA 12.1 with pip3 install from the official PyTorch website. For the environment I used the virtualenv package from pip instead of conda. The version of python that I used is 3.8. I installed the rest of the packages in the environment with pip commands. Additionally, I installed the ninja package with pip install ninja because python setup.py install will be faster. I hope this will help you!

robofar commented 8 months ago

@FlorinM25 I tried with pytorch=2.1.1 and cuda=12.1 and python=3.8 but I got libcudart error @Sebastian-Garcia I also tried with pytorch=1.10.1 and cuda=11.3 and python=3.9 but I am getting this unpack error. Did you figure it out in the end? What pytorch, cuda and python versions you use?

andrewnc commented 7 months ago

I'm getting the same unpack error because the distance comparison https://github.com/princeton-vl/DROID-SLAM/blob/main/droid_slam/factor_graph.py#L322 comes back as close to zero and gets set to inf which is then skipped https://github.com/princeton-vl/DROID-SLAM/blob/main/droid_slam/factor_graph.py#L352

Dong09 commented 3 months ago

Were you able to solve this?

estaudere commented 3 months ago

Would also like an update on this!

Ysc-shark commented 3 months ago

I initially encountered this issue as well, but later discovered that it was indeed due to a problem with the datapath. For example, the script provided by the author uses the path 'TUM-RGBD,' but in my case, the folder was actually named 'TUM_RGBD.' I wonder if anyone else is facing a similar issue?

My running environment is: Ubuntu 20.04, RTX 3090, Python 3.9, PyTorch 1.10, CUDA 11.3. I installed the environment using the yaml file content provided by Yaxun-Yang in #28.

FlorinM25 commented 3 months ago

Hello! You can see this txt file: https://github.com/FlorinM25/DROID-SLAM/blob/main/working-with-DROID-SLAM.txt. Here are all the steps and information I gathered while working with DROID-SLAM regarding setup. I hope you find them helpful.

XichongLing commented 2 months ago

In my case it was caused by the video.counter.value==1 when the droidbackend was invoked. The reason is that camera pose shifts in my dataset are so minor that is below the motion filter thresh (args.filter_thresh) and no frames were added during the track process.

YuxinYao620 commented 2 months ago

In my case it was caused by the video.counter.value==1 when the droidbackend was invoked. The reason is that camera pose shifts in my dataset are so minor that is below the motion filter thresh (args.filter_thresh) and no frames were added during the track process.

Hello Xichong, May I ask how did you you resolve it? I am also dealing with some dataset with minor shifts. Thank you in advance!

XichongLing commented 2 months ago

In my case it was caused by the video.counter.value==1 when the droidbackend was invoked. The reason is that camera pose shifts in my dataset are so minor that is below the motion filter thresh (args.filter_thresh) and no frames were added during the track process.

Hello Xichong, May I ask how did you you resolve it? I am also dealing with some dataset with minor shifts. Thank you in advance!

Set the args.filter_thresh to a smaller number can get the program running. If you know your sequence is monocular, you can skip this program and manually set the extrinsics motion sequences to identical matrices (I assume you are estimating the camera motions from an outer project).