Closed erik-fauna closed 3 months ago
What is your Dockerfile or what is your base image? I only tested PyTorch on amd64. See this docker file for example: https://github.com/introlab/rtabmap/blob/master/docker/frontiers2022/Dockerfile
The base of my image is different from yours, which could be part of the issue.
# Use an NVIDIA CUDA base image that supports Ubuntu 22.04
FROM nvidia/cuda:12.1.0-devel-ubuntu22.04
Generally we're doing the same thing for our installs, but I'll swap to an nvidia base image that has pytorch installed specifically. That being said, when I enter my image and run torch.cuda.is_available()
, I do see True
. Unfortunately the one you linked uses ubuntu 20.04, making it harder to install humble, which is required for tf2_geometry_msgs
used by rtabmap_conversions
.
It looks like 24.07-py3 image is available on arm64 and amd64, and it is on Ubuntu jammy:
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
so ROS2 humble would be available. Torch is not installed at the same place than on my dockerfile above, it is now here:
$ find / -name "TorchConfig.cmake"
/usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake
It looks like we cannot use OpenCV built by nvidia that is installed in /usr/local
because stitching
module is missing:
CMake Error at /usr/local/lib/python3.10/dist-packages/cmake/data/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
Could NOT find OpenCV (missing: stitching) (found version "4.7.0")
Call Stack (most recent call first):
/usr/local/lib/python3.10/dist-packages/cmake/data/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
/usr/local/lib/cmake/opencv4/OpenCVConfig.cmake:354 (find_package_handle_standard_args)
CMakeLists.txt:234 (FIND_PACKAGE)
After installing libopencv-dev
, we should explicitly link to system version:
cmake -DTorch_DIR=/usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch \
-DOpenCV_DIR=/usr/lib/x86_64-linux-gnu/cmake/opencv4 \
-DWITH_TORCH=ON \
-DWITH_PYTHON=ON ..
It builds without errors on amd64 image. I didn't try under ros2, but the standalone is working:
Thank you for the update, I was coming across the same issues but I was unaware that cmake flags could resolve these issues.
I'm unsure why, but installing libopencv-dev
appears to break cv2 in python, which is necessary to create the superpoint.pt
file from the trace.py
script. Therefore, that file should be created before installing libopencv-dev
in the Dockerfile.
Building rtabmap_ros
from source also requires OpenCV with stitching, therefore you will need to add that flag for a successful build as well:
RUN git clone --branch ros2 https://github.com/introlab/rtabmap_ros.git src/rtabmap_ros \
&& source /opt/ros/humble/setup.bash \
&& colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release -DOpenCV_DIR=/usr/lib/x86_64-linux-gnu/cmake/opencv4
Aside from that, actually running the ros2 node had an issue finding this package: symbol lookup error: /opt/hpcx/ucc/lib/libucc.so.1: undefined symbol: ucs_config_doc_nop
which was resolved by adding export LD_LIBRARY_PATH=/opt/hpcx/ucx/lib:$LD_LIBRARY_PATH
before running the launch file.
With those additions, this now runs on ros2 humble, however it looks like SuperGlue is still having issues. Currently I'm seeing this error message.
RegistrationVis.cpp:183::parseParameters() PyMatcher/Path parameter should be set to use Python3 matching (Vis/CorNNType=6), using default 1.
You may have to add parameter "PyMatcher/Path": "/PATH/TO/SuperGluePretrainedNetwork/rtabmap_superglue.py"
for rtabmap node.
That worked, thanks!
Note: this is a direct comparison against arm64 architecture, where this does work. Besides architecture, cmake versions were slightly different, with cmake 3.22.1 on amd64, and cmake 3.29.1 on arm64.
In a dockerfile, after installing all the dependencies, the following code is run:
In both architectures, this fails to provide rtabmap with CUDA enabled torch for
SupertPoint
.However, adding the Torch_DIR flag like this works in arm64:
Unfortunately, it fails to compile on the amd64 architecture, where the location of the library is the same. There are some differences in the cmake output, in particular there are a lot (maybe 40?) of libraries missing from the
Found
output that are in the/usr/lib/x86_64-linux-gnu
folder, such as:The
LD_LIBRARY_PATH
is not changed when testing with or without the flag, so the libraries should be visible regardless. Any idea why they wouldn't be found after adding that flag?The failure to compile is mostly due to undefined references for things like
rtabmap, uToUpperCase, UFile, grid_map, pcl, PointMatcher, uBool2str, UDirectory, etc