Open adamlm opened 2 months ago
I am not sure if this is fixed with specific PR, but this problem cannot be observed with rolling
.
Can you try removing the node from the executor before it goes out of scope?
I am not sure if this is fixed with specific PR, but this problem cannot be observed with rolling.
@fujitatomoya I ran the example code in the latest Rolling Docker image (rolling-ros-base
, image ID/digest f875aa41a6f0
) and got a segfault there too.
Container start command:
docker run --rm -it ros:rolling-ros-base
The only changes I made inside the container were:
/workspace/src
ros2 pkg create
Is there anything else I can provide to help pinpoint where the issue might be?
Can you try removing the node from the executor before it goes out of scope?
@christophebedard I made the following change to the example.
{
auto node{std::make_shared<rclcpp::Node>("node")};
executor.add_node(node);
executor.remove_node(node);
}
The program ran fine for over 10 minutes running on Humble with FastDDS.
Can you try removing the node from the executor before it goes out of scope?
@christophebedard I made the following change to the example.
{ auto node{std::make_shared<rclcpp::Node>("node")}; executor.add_node(node); executor.remove_node(node); }
The program ran fine for over 10 minutes running on Humble with FastDDS.
I think it's reasonable to expect the user to remove the node from the executor before it goes out of scope/gets destroyed. See this test which does something similar to your reproducer: https://github.com/ros2/rclcpp/blob/2f71d6e249f626772da3f8a1bb7c8d141d9d0d52/rclcpp/test/rclcpp/test_executor.cpp#L75-L102
I'm not sure if this would cover your full use-case, so please let us know.
@adamlm thanks for checking this! actually what i tried is to source build with rolling
. so maybe latest patches in rolling source (not available as package) fix this issue.
We're having a similar issue when creating a short-lived subscriber. I think it may be the same since:
We were originally upgrading from noetic to humble. I noticed https://github.com/ros2/rmw_fastrtps/issues/728, which seemed to be resolved by https://github.com/ros2/rclcpp/pull/2142. This PR appears to be in jazzy, however, upgrading to jazzy has not fixed the issue for us.
Bug report
Description
I am trying to set up an executor "spinner thread" that will spin a specified
rclcpp::Executor
until commanded to stop. My goal is to construct the executor and spinner thread in one scope then add nodes to that executor in another scope. However, therclcpp::Executor
classes cause a segmentation fault under certain conditions. There appears to be a race condition in the RMW layer that manifests with short-lived nodes.Operating System: Ubuntu 22.04 (Docker container); Ubuntu 22.04 host
Installation type: Binaries
Version or commit hash:
ros-humble-rclcpp: 16.0.10
ros-humble-rmw-fastrtps-cpp: 6.2.7
ros-humble-fastrtps: 2.6.8
ros-humble-rmw-cyclonedds-cpp: 0.10.4
ros-humble-cyclonedds: 1.3.4
DDS implementation:
Client library (if applicable):
rclcpp
Compiler:
Steps to reproduce issue
Minimal reproducible example
Steps
Expected behavior
The program runs indefinitely.
Actual behavior
The program exits due to a segmentation fault.
Program output when compiled with Clang
Program output when compiled with GCC
In (only) one of the many runs I did, I got the following output (when compiled with GCC):
Additional information
Relevant core dump
Other notes
The issue occurs when using either the
SingleThreadedExecutor
orMultiThreadedExecutor
.No segmentation fault occurs* when relocating
node
to the outer scope.No segmentation fault occurs* when using Cyclone DDS
No segmentation fault occurs* when sleeping the main thread for a short duration before adding
node
to the executor.* I concluded a segmentation fault was unlikely after running the program for several minutes.