groot installation fails, post-patch

PatJRobinson commented 1 year ago

There appears to be some errors in the source code for the groot package.

I tried:

git clone https://github.com/sea-bass/turtlebot3_behavior_demos.git
cd turtlebot3_behavior_demos

sudo -E make build

Stack trace below:

Starting >>> groot
[0.503s] WARNING:colcon.colcon_ros.task.ament_python.build:Package 'py_trees' doesn't explicitly install a marker in the package index (colcon-ros currently does it implicitly but that fallback will be removed in the future)
[0.504s] WARNING:colcon.colcon_ros.task.ament_python.build:Package 'py_trees' doesn't explicitly install the 'package.xml' file (colcon-ros currently does it implicitly but that fallback will be removed in the future)
[0.671s] WARNING:colcon.colcon_ros.task.ament_python.build:Package 'py_trees_js' doesn't explicitly install a marker in the package index (colcon-ros currently does it implicitly but that fallback will be removed in the future)
Finished <<< turtlebot3_description [1.04s]
Finished <<< turtlebot3_cartographer [1.20s]
Finished <<< turtlebot3_navigation2 [1.26s]
Finished <<< turtlebot3_teleop [1.26s]
Finished <<< py_trees [1.38s]
Finished <<< py_trees_js [1.43s]
Finished <<< dynamixel_sdk [1.55s]
Finished <<< dynamixel_sdk_custom_interfaces [7.74s]
Starting >>> dynamixel_sdk_examples
--- stderr: groot
/turtlebot3_ws/src/groot/QtNodeEditor/src/NodeState.cpp: In member function ‘QtNodes::NodeState::ConnectionPtrSet QtNodes::NodeState::connections(QtNodes::PortType, QtNodes::PortIndex) const’:
/turtlebot3_ws/src/groot/QtNodeEditor/src/NodeState.cpp:51:34: warning: comparison of integer expressions of different signedness: ‘QtNodes::PortIndex’ {aka ‘int’} and ‘std::vector<std::unordered_map<QUuid, QtNodes::Connection*> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
   51 |   if( portIndex < 0 || portIndex >= connections.size() )
      |                        ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
/turtlebot3_ws/src/groot/bt_editor/XML_utilities.cpp: In function ‘bool VerifyXML(QDomDocument&, const std::vector<QString>&, std::vector<QString>&)’:
/turtlebot3_ws/src/groot/bt_editor/XML_utilities.cpp:199:33: error: invalid initialization of reference of type ‘const std::unordered_map<std::__cxx11::basic_string<char>, BT::NodeType>&’ from expression of type ‘std::set<std::__cxx11::basic_string<char> >’
  199 |         BT::VerifyXML(xml_text, registered_nodes); // may throw
      |                                 ^~~~~~~~~~~~~~~~
In file included from /turtlebot3_ws/src/groot/bt_editor/XML_utilities.cpp:5:
/opt/ros/galactic/include/behaviortree_cpp_v3/xml_parsing.h:42:65: note: in passing argument 2 of ‘void BT::VerifyXML(const string&, const std::unordered_map<std::__cxx11::basic_string<char>, BT::NodeType>&)’
   42 |                const std::unordered_map<std::string, NodeType>& registered_nodes);
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/behavior_tree_editor.dir/build.make:251: CMakeFiles/behavior_tree_editor.dir/bt_editor/XML_utilities.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.cpp: In member function ‘void SidepanelMonitor::on_timer()’:
/turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.cpp:61:42: warning: ‘bool zmq::detail::socket_base::recv(zmq::message_t*, int)’ is deprecated: from 4.3.1, use recv taking a reference to message_t and recv_flags [-Wdeprecated-declarations]
   61 |         while(  _zmq_subscriber.recv(&msg) )
      |                                          ^
In file included from /turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.h:5,
                 from /turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.cpp:1:
/usr/include/zmq.hpp:1407:10: note: declared here
 1407 |     bool recv(message_t *msg_, int flags_ = 0)
      |          ^~~~
/turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.cpp: In member function ‘bool SidepanelMonitor::getTreeFromServer()’:
/turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.cpp:149:32: warning: ‘bool zmq::detail::socket_base::send(zmq::message_t&, int)’ is deprecated: from 4.3.1, use send taking message_t and send_flags [-Wdeprecated-declarations]
  149 |         zmq_client.send(request);
      |                                ^
In file included from /turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.h:5,
                 from /turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.cpp:1:
/usr/include/zmq.hpp:1326:10: note: declared here
 1326 |     bool send(message_t &msg_,
      |          ^~~~
/turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.cpp:151:47: warning: ‘bool zmq::detail::socket_base::recv(zmq::message_t*, int)’ is deprecated: from 4.3.1, use recv taking a reference to message_t and recv_flags [-Wdeprecated-declarations]
  151 |         bool received = zmq_client.recv(&reply);
      |                                               ^
In file included from /turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.h:5,
                 from /turtlebot3_ws/src/groot/bt_editor/sidepanel_monitor.cpp:1:
/usr/include/zmq.hpp:1407:10: note: declared here
 1407 |     bool recv(message_t *msg_, int flags_ = 0)
      |          ^~~~
make[1]: *** [CMakeFiles/Makefile2:158: CMakeFiles/behavior_tree_editor.dir/all] Error 2
make: *** [Makefile:141: all] Error 2
---
Failed   <<< groot [11.0s, exited with code 2]

Am I doing anything wrong?

sea-bass commented 1 year ago

Happened for me as well.

Seems this was merged after I forked Groot in https://github.com/BehaviorTree/Groot/pull/123, but the PR shows it as a warning, not an error. Maybe something changed recently in the compiler flags, specifically -Wsign-compare?

I just updated my fork to include that update and it seems to build fine. You should rebuild in a way that the cache will clone the Groot repo again (or just do a clean rebuild), and it should work.

Thanks for reporting these issues, BTW!

sea-bass commented 1 year ago

Actually, that's not it. I think groot build is broken right now on ROS2 Galactic, but it seems to work on Humble. That's all the time I have to look into it for now, unfortunately.

I've been meaning to do this upgrade to Humble anyway, so I will probably do that.

PatJRobinson commented 1 year ago

No problem, thanks for putting this out there!

Apologies if this is now off-topic, but I've been following your tutorial for NVIDIA + ros noetic (https://roboticseabass.com/2021/04/21/docker-and-ros/) and have managed to build the image.

I copied the Dockerfile:

FROM nvidia/cudagl:11.1.1-base-ubuntu20.04

# Minimal setup
RUN apt-get update \
 && apt-get install -y locales lsb-release
ARG DEBIAN_FRONTEND=noninteractive
RUN dpkg-reconfigure locales

# Install ROS Noetic
RUN sh -c 'echo "deb http://packages.ros.org/ros/ubuntu $(lsb_release -sc) main" > /etc/apt/sources.list.d/ros-latest.list'
RUN apt-key adv --keyserver 'hkp://keyserver.ubuntu.com:80' --recv-key C1CF6E31E6BADE8868B172B4F42ED6FBAB17C654
RUN apt-get update \
 && apt-get install -y --no-install-recommends ros-noetic-desktop-full
RUN apt-get install -y --no-install-recommends python3-rosdep
RUN rosdep init \
 && rosdep fix-permissions \
 && rosdep update
RUN echo "source /opt/ros/noetic/setup.bash" >> ~/.bashrc

Then did:

# Build the Dockerfile
docker build -t nvidia_ros .

# Start a terminal
docker run -it --net=host --gpus all \
    --env="NVIDIA_DRIVER_CAPABILITIES=all" \
    --env="DISPLAY" \
    --env="QT_X11_NO_MITSHM=1" \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
    nvidia_ros \
    bash

So far so good. But when I run gazebo, I get

libGL error: MESA-LOADER: failed to retrieve device information

If I do

export DISPLAY=:0
xhost +

I then get

Invalid MIT-MAGIC-COOKIE-1 keyxhost:  unable to open display ":0"

I am a bit confused about how to fix this issue, or even what I am trying to achieve. I believe I am forwarding a GUI session to the container? If you have encountered this issue, and there's a simple fix, I'd greatly appreciate any insights you might have. I will continue looking online for a solution.

sea-bass commented 1 year ago

I then get

Invalid MIT-MAGIC-COOKIE-1 keyxhost:  unable to open display ":0"

Yeah, this stuff is annoying. Docker is great until you need display. Could you try changing the line in the docker run command to:

    --env="DISPLAY=${DISPLAY}" \

PatJRobinson commented 1 year ago

Unfortunately that didn't work. I'm not really sure where to go from here; is my desktop environment misconfigured or something, or is it some NVIDIA issue?

When I first open a terminal and echo $DISPLAY, I get :1

So, if I export DISPLAY=:0 Then run the docker run command as you suggested:

docker run -it --net=host --gpus all \
    --env="NVIDIA_DRIVER_CAPABILITIES=all" \
    --env="DISPLAY=${DISPLAY}" \
    --env="QT_X11_NO_MITSHM=1" \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
    nvidia_ros \
    bash

And then, inside the container gazebo I get

No protocol specified
No protocol specified
No protocol specified

Again, if I try to run the container when display is set to :1, I get the libGL error: MESA-LOADER: failed to retrieve device information As I believe the actual display has not been forwarded to the container, so this library doesn't know where to send its output?

PS: As a side note, I am able to run rqt when the display is set to :1, although I still get the libGL error.

sea-bass commented 1 year ago

Did you install the NVIDIA container runtime from here? https://github.com/NVIDIA/nvidia-docker

sea-bass commented 1 year ago

Also, I just got a working version that uses Humble which should finish building fully. Let me know if you're able to try out this PR : https://github.com/sea-bass/turtlebot3_behavior_demos/pull/12

PatJRobinson commented 1 year ago

Yes I followed the installation guide linked from that repo.

Actually, I believe the nvidia part is working correctly. When I run nvidia-smi I get:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   33C    P8    11W /  N/A |     10MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

My current needs are actually simpler than this project; all I need to do is run some cuda kernels inside the container, wrapped in a ROS1 node and publishing point clouds to be subscribed to from the host. To this end, I substituted the base nvidia image for:

nvidia/cudagl:11.4.2-devel

When I build the container, I can indeed compile some test cuda code, and run a simple ROS point cloud publisher and echo the topic on the host.

So its a success, for now, and all thanks to the this project you have kindly shared!

As for the display troubles, I think it's an issue with Xhost; and I can't remember but I may have done something in the past to mess up the configuration, following online suggestions without really knowing what I as doing (as always!). There is this suspect line in my bashrc:

if [ "$SUDO_USER" != "" ] && [ "$DISPLAY" != "" ]
then
    export XAUTHORITY=$(grep "^${SUDO_USER}:" /etc/passwd | cut -d : -f 6)/.Xauthority
fi

Which I have now commented out.... but still getting the same issues.

I think I will try again on a fresh ubuntu install when I get the time, as in the future I can see containerised gazebo builds coming in very handy.

sea-bass commented 1 year ago

Glad you got the NVIDIA CUDA support squared away, which seems to be what you were looking for.

In updating to Humble, Groot again builds. Also, I got rid of this entire NVIDIA base image and am using the regular ROS Humble image instead, mostly because NVIDIA has not released any cudagl images for Ubuntu 22.04 yet and it wasn't an important part of this demo (sorry).

Closing this out due to the Humble upgrade just being merged.

sea-bass / turtlebot3_behavior_demos

groot installation fails, post-patch #11