ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
534 stars 413 forks source link

rclcpp::Node::make_shared() or destructor does not appear to be threadsafe #447

Open matt-attack opened 6 years ago

matt-attack commented 6 years ago

Trying to create then destroy many nodes at the same time causes a wide variety of different crashes. It seems that either creating or destroying a node is not thread safe.

Here's a minimal example that crashes almost every time for me:

#include <chrono>
#include <iostream>
#include <memory>
#include <string>

#include "rclcpp/rclcpp.hpp"

#include "example_interfaces/srv/add_two_ints.hpp"

using namespace std::chrono_literals;

void fun(std::string id)
{
  std::shared_ptr<rclcpp::Node> node = rclcpp::Node::make_shared("add_two_ints_client"+id);
}

int main(int argc, char ** argv)
{
  // Force flush of the stdout buffer.
  setvbuf(stdout, NULL, _IONBF, BUFSIZ);

  rclcpp::init(argc, argv);

  int num_threads = 20;

  std::vector<std::thread> threads;
  for (int i = 0; i < num_threads; i++)
  {
    threads.push_back(std::thread(std::bind(fun, std::to_string(i))));
  }

  for (int i = 0; i < num_threads; i++)
  {
    threads[i].join();
    printf("Thread %d Finished\n", i);
  }

  printf("Finished");

  rclcpp::shutdown();
  return 0;
}

Often I will get

*** Error in `/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client': free(): invalid pointer: 0x00007fce4c447ca8 ***
*** Error in `/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client': free(): invalid pointer: 0x00007fce4c447ca8 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fce4e80f7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fce4e81837a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fce4e81c53c]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps10RTPSDomain21removeRTPSParticipantEPNS1_15RTPSParticipantE+0xdb)[0x7fce4c10cb9b]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps15ParticipantImplD1Ev+0xa8)[0x7fce4c114b98]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps15ParticipantImplD0Ev+0x9)[0x7fce4c114e39]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps6Domain17removeParticipantEPNS0_11ParticipantE+0x8a)[0x7fce4c10ef3a]
/opt/ros/ardent/lib/librmw_fastrtps_cpp.so(rmw_destroy_node+0xff)[0x7fce4c66d90f]
/opt/ros/ardent/lib/librcl.so(rcl_node_fini+0x96)[0x7fce4db61f36]
/opt/ros/ardent/lib/librclcpp.so(+0x8329d)[0x7fce4f17d29d]
/opt/ros/ardent/lib/librclcpp.so(_ZN6rclcpp15node_interfaces8NodeBaseD2Ev+0x2a5)[0x7fce4f17d6c5]
/opt/ros/ardent/lib/librclcpp.so(_ZN6rclcpp15node_interfaces8NodeBaseD0Ev+0x9)[0x7fce4f17d6f9]
/opt/ros/ardent/lib/librclcpp.so(_ZN6rclcpp4NodeD2Ev+0x3a2)[0x7fce4f17b6a2]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x4061a9]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x4060fb]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x405f8b]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0x42)[0x4034da]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x402fdd]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x402cde]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x402cfa]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x40274f]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x406304]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x406243]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x406208]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x40612a]
/home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client[0x40607a]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7fce4ee30c80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fce4f3ef6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fce4e89f41d]
======= Memory map: ========
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fce4e80f7e5]
00400000-0040a000 r-xp 00000000 103:07 8793911                           /home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client
00609000-0060a000 r--p 00009000 103:07 8793911                           /home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client
0060a000-0060b000 rw-p 0000a000 103:07 8793911                           /home/matthew/Desktop/ros2_overlay_release/install/lib/demo_nodes_cpp/add_two_ints_client
0184d000-0198b000 rw-p 00000000 00:00 0                                  [heap]
7fcd0dfd4000-7fcd0dfd5000 ---p 00000000 00:00 0 
7fcd0dfd5000-7fcd0e7d5000 rw-p 00000000 00:00 0 
7fcd0e7d5000-7fcd0e7d6000 ---p 00000000 00:00 0 
7fcd0e7d6000-7fcd0efd6000 rw-p 00000000 00:00 0 
7fcd0efd6000-7fcd0efd7000 ---p 00000000 00:00 0 
7fcd0efd7000-7fcd0f7d7000 rw-p 00000000 00:00 0 
7fcd0f7d7000-7fcd0f7d8000 ---p 00000000 00:00 0 
7fcd0f7d8000-7fcd0ffd8000 rw-p 00000000 00:00 0 
7fcd0ffd8000-7fcd0ffd9000 ---p 00000000 00:00 0 
7fcd0ffd9000-7/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fce4e81837a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fce4e81c53c]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps10RTPSDomain21removeRTPSParticipantEPNS1_15RTPSParticipantE+0xdb)[0x7fce4c10cb9b]
fcd107d9000 rw-p 00000000 00:00 0 
7fcd107d9000-7fcd107da000 ---p 00000000 00:00 0 
7fcd107da000-7fcd10fda000 rw-p 00000000 00:00 0 
7fcd10fda000-7fcd10fdb000 ---p 00000000 00:00 0 
7fcd10fdb000-7fcd117db000 rw-p 00000000 00:00 0 
7fcd117db000-7fcd117dc000 ---p 00000000 00:00 0 
7fcd117dc000-7fcd11fdc000 rw-p 00000000 00:00 0 
7fcd11fdc000-7fcd11fdd000 ---p 00000000 00:00 0 
7fcd11fdd000-7fcd127dd000 rw-p 00000000 00:00 0 
7fcd127dd000-7fcd127de000 ---p 00000000 00:00 0 
7fcd127de000-7fcd12fde000 rw-p 00000000 00:00 0 
7fcd12fde000-7fcd12fdf000 ---p 00000000 00:00 0 
7fcd12fdf000-7fcd137df000 rw-p 00000000 00:00 0 
7fcd137df000-7fcd137e0000 ---p 00000000 00:00 0 
7fcd137e0000-7fcd13fe0000 rw-p 00000000 00:00 0 
7fcd13fe0000-7fcd13fe1000 ---p 00000000 00:00 0 
7fcd13fe1000-7fcd147e1000 rw-p 00000000 00:00 0 
7fcd147e1000-7fcd147e2000 ---p 00000000 00:00 0 
7fcd147e2000-7fcd14fe2000 rw-p 00000000 00:00 0 
7fcd14fe2000-7fcd14fe3000 ---p 00000000 00:00 0 
7fcd14fe3000-7fcd157e3000 rw-p 00000000 00:00 0 
7fcd157e3/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps15ParticipantImplD0Ev+0x9)[0x7fce4c114e39]
/opt/ros/ardent/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps6Domain17removeParticipantEPNS0_11ParticipantE+0x8a)[0x7fce4c10ef3a]
/opt/ros/ardent/lib/librmw_fastrtps_cpp.so(rmw_destroy_node+0xff)[0x7fce4c66d90f]
/opt/ros/ardent/lib/librcl.so(rcl_node_fini+0x96)[0x7fce4db61f36]
/opt/ros/ardent/lib/librclcpp.so(+0x8329d)[0x7fce4f17d29d]
/opt/ros/ardent/lib/librclcpp.so(_ZN6rclcpp15node_interfaces8NodeBaseD2Ev+0x2a5)[0x7fce4f17d6c5]

Sometimes I get:

>>> [rcutils|error_handling.c:155] rcutils_set_error_state()
This error state is being overwritten:

  'string_map already initialized, at /tmp/binarydeb/ros-ardent-rcutils-0.4.0/src/string_map.c:56'

with this new error message:

  'Failed to initialize map for logger severities [string_map already initialized, at /tmp/binarydeb/ros-ardent-rcutils-0.4.0/src/string_map.c:56]. Severities will not be configurable., at /tmp/binarydeb/ros-ardent-rcutils-0.4.0/src/logging.c:108'

rcutils_reset_error() should be called after error handling to avoid this.
<<<
[rcutils|/tmp/binarydeb/ros-ardent-rcl-0.4.0/src/rcl/node.c:112] error initializing logging: Failed to initialize map for logger severities [string_map already initialized, at /tmp/binarydeb/ros-ardent-rcutils-0.4.0/src/string_map.c:56]. Severities will not be configurable., at /tmp/binarydeb/ros-ardent-rcutils-0.4.0/src/logging.c:108

Other times:

terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
  what():  could not create subscription: create_subscriber() could not create subscriber, at /tmp/binarydeb/ros-ardent-rmw-fastrtps-cpp-0.4.0/src/rmw_subscription.cpp:140, at /tmp/binarydeb/ros-ardent-rcl-0.4.0/src/rcl/subscription.c:149

I have been seeing lots of bugs/crashes that seem related to a race condition when there are multiple nodes in a process, but this is the most basic and reproducible example.

wjwwood commented 6 years ago

The short answer is that I would not expect it to be thread-safe. Unfortunately rclcpp hasn't gone through an API review yet and still lacks clarity on a lot of these kinds of details and therefore also lacks documentation on these points.

However, if you look into the functions that rclcpp is ultimately calling in rcl, there are several which are explicitly marked as not thread-safe, e.g. rcl_guard_condition_init/rcl_guard_condition_fini (especially if rcl_trigger_guard_condition is called during rcl_guard_condition_fini) or even just rcl_node_init.

Now, in each case, those rcl functions may be thread-safe in certain circumstances so the solution may not be as drastic a putting a global mutex lock around all the rclcpp versions of those functions, but the time has not yet been taken to figure out where it is thread-safe and where it is not and what to do about it in rclcpp to ensure that it used in a safe way.

There's some work coming up where we will audit the memory and thread behavior of these functions and the underlying rmw functions too, so we'll hopefully be able to provide better behavior and documentation soon. In the meantime we'd certainly like help fixing issues narrowly where ever you find it convenient.

sagniknitr commented 6 years ago

@wjwwood Is there any chance it is caused due to deadlock in Fast-RTPS threads ? It seems there is a issue on Fast RTPS (https://github.com/eProsima/Fast-RTPS/issues/190) which may relate to this.

matt-attack commented 6 years ago

@sagniknitr I have seen that issue come up as well, so it's very possible its at least related