princeton-vl / DPVO

Deep Patch Visual Odometry/SLAM
MIT License
608 stars 71 forks source link

Illegal memory access while using custom images #26

Open adithya-Avataar opened 1 year ago

adithya-Avataar commented 1 year ago

Hi

I am trying to use DPVO to estimate poses for my object. I have continuous images surrounding the object from all directions. When I run the code on my images using demo.py code. The directory contains about 115 images in all.

File "/DPVO/demo.py", line 92, in pred_traj = run(cfg, args.network, args.imagedir, args.calib, args.stride, args.skip, args.viz, args.timeit, args.save_reconstruction) File "/root/miniconda3/envs/dpvo/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/DPVO/demo.py", line 51, in run slam(t, image, intrinsics) File "/DPVO/dpvo/dpvo.py", line 394, in call self.update() File "/DPVO/dpvo/dpvo.py", line 278, in update self.network.update(self.net, ctx, corr, None, self.ii, self.jj, self.kk) File "/root/miniconda3/envs/dpvo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, **kwargs) File "/DPVO/dpvo/net.py", line 80, in forward ix, jx = fastba.neighbors(kk, jj) RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1

when I run the code as CUDA_LAUNCH_BLOCKING=1 python demo.py --save_reconstruction --save_trajectory --imagedir=images3_jpg/ --calib=custom_calib.txt --stride=1 The code runs without any errors, but the the saved trajectory file contains pose values as nans beyond index 15 I have observed the same with multiple other custom image directories as well.

lahavlipson commented 1 year ago

Are the images sequential, i.e. from a video?

adithya-Avataar commented 1 year ago

Hi thank you very much for your quick response. Yes they are sequential. I have checked the order by printing the filenames in the image stream. (They are a directory of images and they are order after sorting.)

When i resize the images to 2k (1920, 1080) then the code runs withuout errors, but the trajectory generated seems to be very wrong. Some more info:

Image resolution is 4k resolution - (4032, 3024) Hardware - GPU - NVIDIA A10G Architecture - Ampere Compute capabillity - 8.6

I have also tried the running docker version, but i see the same error Could you please tell me if there is anything i could try?

lahavlipson commented 1 year ago

You could try adjusting the stride, lowering the image resolution further, increasing the patch lifetime or optimization window, though these decisions often depend on the degree of camera motion.

Regarding the memory access error, if you're able to share the images I can investigate the cause (assuming you're permitted to do so).

adithya-Avataar commented 1 year ago

Hi I have tried doubling all the values i have just tried increasing optimization window, number of patches, patch lifetime. I get an output, but its still not as expected. I have also tried reducing the image size further down to 1024*768. But still the output is not on par with droid slam too.

I have attached the images i have been trying on here - images I have attached the calibration file as well here - calib

Also again thank you very much

lahavlipson commented 1 year ago

The numerical issue disappears after disabling mixed precision. I set the stride=1 and shrunk the image resolution and intrinsics by 50%.

DPVO:

image

DROID:

image

adithya-Avataar commented 1 year ago

Thank you very much. I will try out with these settings.

adithya-Avataar commented 1 year ago

Hi @lahavlipson I had tried out the settings you had mentioned, and it works, But my output (predicted poses) varies a lot between different runs for the same set of images with same set of hyper parameters and only in one of the runs, I get the output as expected. I am hoping to understand if this is an implementation issue from my end or if this expected behaviour?? I have attached the output for a few runs on the same object that i had shared before. Please let me know if something is being done in a wrong manner. Screenshot from 2023-04-11 12-36-15 Screenshot from 2023-04-11 12-37-32 Screenshot from 2023-04-11 12-38-18 Screenshot from 2023-04-11 12-39-04 Screenshot from 2023-04-11 12-39-15 Screenshot from 2023-04-11 12-40-15 Screenshot from 2023-04-11 12-42-01 Screenshot from 2023-04-11 12-42-42 Screenshot from 2023-04-11 12-43-21 Screenshot from 2023-04-11 12-44-09 Screenshot from 2023-04-11 12-44-14 Screenshot from 2023-04-11 12-44-54

lahavlipson commented 1 year ago

DPVO selects patch centroids randomly, so variance in the output like what you've shown is possible. The chosen scale of the scene is most likely related to the randomly initialized depth.

For more predictable behavior, you can increase the number of patches tracked per frame.