ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
513 stars 412 forks source link

events-executor segfault on fast-dds `RMWSubscriptionEvent::update_data_available()` #2437

Open alsora opened 4 months ago

alsora commented 4 months ago

Bug report

Required Info:

Steps to reproduce issue

This issue happens as part of a larger application, so I don't have a minimal reproducible example. I have 2 nodes in the same process. Each node has its own events-executor, spinning in a dedicated thread.

Node A periodically publishes an occupancy grid using this publisher

    rclcpp::PublisherOptions pub_options;
    pub_options.use_intra_process_comm = rclcpp::IntraProcessSetting::Disable;
    m_map_pub = parent_node->create_publisher<nav_msgs::msg::OccupancyGrid>(
      map_topic_name,
      rclcpp::QoS(rclcpp::KeepLast(1)).transient_local().reliable(),
      pub_options);

Node B from time to time creates a subscription to that topic, waits for a message and then lets the subscription go out of scope.

nav_msgs::msg::OccupancyGrid::ConstSharedPtr
get_occupancy_grid(
  const std::string & topic,
  std::chrono::seconds timeout)
{
  nav_msgs::msg::OccupancyGrid::ConstSharedPtr map;
  rclcpp::SubscriptionOptions sub_options;
  sub_options.use_intra_process_comm = rclcpp::IntraProcessSetting::Disable;
  auto map_sub = node->create_subscription<nav_msgs::msg::OccupancyGrid>(
    topic,
    rclcpp::QoS(rclcpp::KeepLast(1)).transient_local().reliable(),
    [&map](nav_msgs::msg::OccupancyGrid::ConstSharedPtr msg) {
      map = msg;
    },
    sub_options);

  auto start_time = std::chrono::steady_clock::now();
  while ((std::chrono::steady_clock::now() - start_time < timeout) && rclcpp::ok()) {
    if (map) {break;}
    std::this_thread::sleep_for(std::chrono::milliseconds(50));
  }
  return map;
}

Actual behavior

I have a unit-test that uses this setup, where the get_occupancy_grid function is called 4-5 times within the test duration. The test fails sporadically, 1/30 times maybe, with the following backtrace

Thread 27 "test_positioner" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffc3ffe100 (LWP 23807)]
0x0000fffff7c65900 in ?? () from /opt/ros/iron/lib/librclcpp.so
(gdb) bt
#0  0x0000fffff7c65900 in ?? () from /opt/ros/iron/lib/librclcpp.so
#1  0x0000fffff7c62468 in ?? () from /opt/ros/iron/lib/librclcpp.so
#2  0x0000fffff7c661e4 in void rclcpp::detail::cpp_callback_trampoline<std::function<void (unsigned long)>, void const*, unsigned long, void>(void const*, unsigned long) () from /opt/ros/iron/lib/librclcpp.so
#3  0x0000fffff1d62a74 in RMWSubscriptionEvent::update_data_available() ()
   from /opt/ros/iron/lib/librmw_fastrtps_shared_cpp.so
#4  0x0000fffff1757428 in eprosima::fastdds::dds::DataReaderImpl::InnerDataReaderListener::on_data_available(eprosima::fastrtps::rtps::RTPSReader*, eprosima::fastrtps::rtps::GUID_t const&, eprosima::fastrtps::rtps::SequenceNumber_t const&, eprosima::fastrtps::rtps::SequenceNumber_t const&, bool&) ()
   from /opt/ros/iron/lib/libfastrtps.so.2.10
#5  0x0000fffff16da1b8 in eprosima::fastrtps::rtps::StatefulReader::NotifyChanges(eprosima::fastrtps::rtps::WriterProxy*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#6  0x0000fffff16dcd30 in eprosima::fastrtps::rtps::StatefulReader::change_received(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::WriterProxy*, unsigned long) ()
   from /opt/ros/iron/lib/libfastrtps.so.2.10
#7  0x0000fffff16dd28c in eprosima::fastrtps::rtps::StatefulReader::processDataMsg(eprosima::fastrtps::rtps::CacheChange_t*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#8  0x0000fffff16b6f4c in eprosima::fastrtps::rtps::StatefulWriter::deliver_sample_to_intraprocesses(eprosima::fastrtps::rtps::CacheChange_t*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#9  0x0000fffff16b7e5c in eprosima::fastrtps::rtps::StatefulWriter::deliver_sample_nts(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::RTPSMessageGroup&, eprosima::fastrtps::rtps::LocatorSelectorSender&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#10 0x0000fffff188c204 in ?? () from /opt/ros/iron/lib/libfastrtps.so.2.10
#11 0x0000fffff16b08d4 in eprosima::fastrtps::rtps::StatefulWriter::unsent_change_added_to_history(eprosima::fastrtps::rtps::CacheChange_t*, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#12 0x0000fffff1735d8c in eprosima::fastdds::dds::DataWriterImpl::perform_create_new_change(eprosima::fastrtps::rtps::ChangeKind_t, void*, eprosima::fastrtps::rtps::WriteParams&, eprosima::fastrtps::rtps::InstanceHandle_t const&) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#13 0x0000fffff1736694 in eprosima::fastdds::dds::DataWriterImpl::create_new_change_with_params(eprosima::fastrtps::rtps::ChangeKind_t, void*, eprosima::fastrtps::rtps::WriteParams&) ()
   from /opt/ros/iron/lib/libfastrtps.so.2.10
#14 0x0000fffff1736760 in eprosima::fastdds::dds::DataWriterImpl::create_new_change(eprosima::fastrtps::rtps::ChangeKind_t, void*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#15 0x0000fffff17283ac in eprosima::fastdds::dds::DataWriterImpl::write(void*) ()
   from /opt/ros/iron/lib/libfastrtps.so.2.10
#16 0x0000fffff1d6c5d8 in rmw_fastrtps_shared_cpp::__rmw_publish(char const*, rmw_publisher_s const*, void const*, rmw_publisher_allocation_s*) () from /opt/ros/iron/lib/librmw_fastrtps_shared_cpp.so
#17 0x0000fffff7e0806c in ?? () from /opt/ros/iron/lib/librcl.so
#18 0x0000ffffe45e2260 in rclcpp::Publisher<nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, std::allocator<void> >::do_inter_process_publish (msg=..., this=0xaaaaab63dfe0)
    at /usr/include/c++/11/bits/shared_ptr_base.h:1295
#19 rclcpp::Publisher<nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, std::allocator<void> >::publish<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > > (msg=..., this=0xaaaaab63dfe0)
    at /opt/ros/iron/include/rclcpp/rclcpp/publisher.hpp:273

Additional information

I think that the problem might be a race condition between the subscription going out of scope while the publisher is trying to push an event to it