ADD rtabmap_lightglue.py to replace superglue, but lightglue failed when the camera is occluded or moving fast with superpoint built-in rtabmap (standalone with latest), need help!

pxy522 commented 8 months ago

Hi,@borongyuan @matlabbe @Phil26AT @sarlinpe, we want to add lightglue to rtabmap, similar to rtabmap_superglue.py (https://github.com/introlab/rtabmap/blob/master/corelib/src/python/rtabmap_superglue.py) reference to https://github.com/introlab/rtabmap/issues/1129 and https://github.com/introlab/rtabmap/issues/1123, Our script is on the fork https://github.com/cdb0y511/rtabmap/commit/d6a7f75046bf4201632f22524771fba208549d0e#diff-d8e61c1e666b0ab9c60b68f0a82f363e058a15daa91a4369efd51779a9e915ba Our script works with superpoint built-in rtabmap, but there are issues when the camera is occluded and moving fast (we find the tensor computation is failed when with input superpoint keypoints). We doubt the superpoint may be different from the original lightglue, or for some other reason. Could @borongyuan @matlabbe @Phil26AT @sarlinpe look into it？

pxy522 commented 8 months ago

1、 When the camera moves too fast or is suddenly occluded, an error occurs during keypoint normalization: LightGlue python init() [ERROR] (2023-10-19 22:00:01.588) PyMatcher.cpp:217::match() Failed to call match() function! [ERROR] (2023-10-19 22:00:01.589) PyMatcher.cpp:218::match() Traceback (most recent call last):

File "/home/ev3rm0re/Project/scripts/SuperGluePretrainedNetwork/ratbmap_lightglue.py", line 81, in match results = matcher(data)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs)

File "/home/ev3rm0re/Project/scripts/SuperGluePretrainedNetwork/models/lightglue.py", line 441, in forward return self._forward(data)

File "/home/ev3rm0re/Project/scripts/SuperGluePretrainedNetwork/models/lightglue.py", line 452, in _forward kpts0 = normalize_keypoints(kpts0, size0).clone()

File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/autocast_mode.py", line 121, in decorate_fwd return fwd(*args, **kwargs)

File "/home/ev3rm0re/Project/scripts/SuperGluePretrainedNetwork/models/lightglue.py", line 35, in normalize_keypoints kpts = (kpts - shift[..., None, :]) / scale[..., None, None]

RuntimeError: The size of tensor a (67) must match the size of tensor b (2) at non-singleton dimension 2 2、 When keypoints normalization is disabled, a linear layer error occurs in the Learnable Fourier Positional Encoding： [ERROR] (2023-10-20 12:00:07.243) PyMatcher.cpp:217::match() Failed to call match() function! [ERROR] (2023-10-20 12:00:07.243) PyMatcher.cpp:218::match() Traceback (most recent call last):

File "/home/ev3rm0re/Project/scripts/SuperGluePretrainedNetwork/ratbmap_lightglue.py", line 81, in match results = matcher(data)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs)

File "/home/ev3rm0re/Project/scripts/SuperGluePretrainedNetwork/models/lightglue.py", line 441, in forward return self._forward(data)

File "/home/ev3rm0re/Project/scripts/SuperGluePretrainedNetwork/models/lightglue.py", line 477, in _forward encoding0 = self.posenc(kpts0)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs)

File "/home/ev3rm0re/Project/scripts/SuperGluePretrainedNetwork/models/lightglue.py", line 71, in forward projected = self.Wr(x)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x92 and 2x32)

pxy522 commented 8 months ago

At first, I assumed that the lack of feature points was the cause of the issue, but I discovered that if the camera is occluded at the beginning, it will still function correctly even in the absence of feature points. Despite replacing superpoint with SIFT, the problem persisted.

pxy522 commented 8 months ago

Compilation of rtabmap： cmake -DTorch_DIR=/usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/ -DWITH_TORCH=ON -DWITH_PYTHON=ON -DWITH_QT=ON -DCMAKE_CUDA_COMPILER="/usr/local/cuda-11.8/bin/nvcc" ..

This question has been bothering me for some time. Looking forward to your response.Thank you in advance.

pxy522 commented 8 months ago

this is the dashboard: rtabmap_error

mattiasmar commented 6 months ago

Related: Have you noticed that pytorch performance is much slower when calling the the pytorch code through rtabmap (through the pymatcher) compared to calling it a pure python (test) program? I see 10x difference.

matlabbe commented 6 months ago

How did you timed it? Did you put a timer in python code around same function and print the result?

mattiasmar commented 6 months ago

   In the python code I call:

        start = perf_counter()
        results= mymodel({**data_, **pred})
        end = perf_counter()

I also load the input data from disk inside the python code as part of this test, so in order to verify that exactly the same data is used in the python and c++ setups.

matlabbe commented 6 months ago

Interesting, it would mean that the python interpreter launched from c++ (we are now using https://github.com/pybind/pybind11 to manage python initialization and calling stuff) is slower than the one used when launching python directly. Could it be that some libraries are dynamically loaded when starting a fresh interpreter from cython? Maybe when launching from python, there are still dependencies in python interpreter cache, speeding up the process when calling more than one time. Could you try to do:

start = perf_counter()
results= mymodel({**data_, **pred})
middle = perf_counter()
results= mymodel({**data_, **pred})
end = perf_counter()

And see if there is a difference in timing between the first and second call?

mattiasmar commented 6 months ago

There is no such difference. I also tried with hard coding the input data (loading it from disk in the python code) and not using the whole rtabmap framework but rather callin only the class RegistrationVis from a standalone project and I saw the performance hit anyway.

In addition in a test calling the same python coda through from a small c++ project using pybind11 did not show a performance hit. So it appears there is something between pybind11 and RegistrationVis that creates the reduction in performance. Any ideas what it could be? I see that the expected amount of cpu cores are used, but not at 100% utilization (~30%). If I ask for just one core (torch.set_num_threads(1)) then indeed only 1 cpu is used, and it reaches 100% utilization in both cases, but the bottom line still shows ~10x performance reduction.

matlabbe commented 6 months ago

There is no such difference. I also tried with hard coding the input data (loading it from disk in the python code) and not using the whole rtabmap framework but rather callin only the class RegistrationVis from a standalone project and I saw the performance hit anyway.

In addition in a test calling the same python coda through from a small c++ project using pybind11 did not show a performance hit. So it appears there is something between pybind11 and RegistrationVis that creates the reduction in performance. Any ideas what it could be? I see that the expected amount of cpu cores are used, but not at 100% utilization (~30%). If I ask for just one core (torch.set_num_threads(1)) then indeed only 1 cpu is used, and it reaches 100% utilization in both cases, but the bottom line still shows ~10x performance reduction.

Will continue to track this issue in this new one: https://github.com/introlab/rtabmap/issues/1189

introlab / rtabmap

ADD rtabmap_lightglue.py to replace superglue, but lightglue failed when the camera is occluded or moving fast with superpoint built-in rtabmap (standalone with latest), need help! #1148