This issue happens as part of a larger application, so I don't have a minimal reproducible example.
I have 2 nodes in the same process.
Each node has its own events-executor, spinning in a dedicated thread.
Node A periodically publishes an occupancy grid using this publisher
I have a unit-test that uses this setup, where the get_occupancy_grid function is called 4-5 times within the test duration.
The test fails sporadically, 1/30 times maybe, with the following backtrace
Thread 27 "test_positioner" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffc3ffe100 (LWP 23807)]
0x0000fffff7c65900 in ?? () from /opt/ros/iron/lib/librclcpp.so
(gdb) bt
#0 0x0000fffff7c65900 in ?? () from /opt/ros/iron/lib/librclcpp.so
#1 0x0000fffff7c62468 in ?? () from /opt/ros/iron/lib/librclcpp.so
#2 0x0000fffff7c661e4 in void rclcpp::detail::cpp_callback_trampoline<std::function<void (unsigned long)>, void const*, unsigned long, void>(void const*, unsigned long) () from /opt/ros/iron/lib/librclcpp.so
#3 0x0000fffff1d62a74 in RMWSubscriptionEvent::update_data_available() ()
from /opt/ros/iron/lib/librmw_fastrtps_shared_cpp.so
#4 0x0000fffff1757428 in eprosima::fastdds::dds::DataReaderImpl::InnerDataReaderListener::on_data_available(eprosima::fastrtps::rtps::RTPSReader*, eprosima::fastrtps::rtps::GUID_t const&, eprosima::fastrtps::rtps::SequenceNumber_t const&, eprosima::fastrtps::rtps::SequenceNumber_t const&, bool&) ()
from /opt/ros/iron/lib/libfastrtps.so.2.10
#5 0x0000fffff16da1b8 in eprosima::fastrtps::rtps::StatefulReader::NotifyChanges(eprosima::fastrtps::rtps::WriterProxy*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#6 0x0000fffff16dcd30 in eprosima::fastrtps::rtps::StatefulReader::change_received(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::WriterProxy*, unsigned long) ()
from /opt/ros/iron/lib/libfastrtps.so.2.10
#7 0x0000fffff16dd28c in eprosima::fastrtps::rtps::StatefulReader::processDataMsg(eprosima::fastrtps::rtps::CacheChange_t*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#8 0x0000fffff16b6f4c in eprosima::fastrtps::rtps::StatefulWriter::deliver_sample_to_intraprocesses(eprosima::fastrtps::rtps::CacheChange_t*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#9 0x0000fffff16b7e5c in eprosima::fastrtps::rtps::StatefulWriter::deliver_sample_nts(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::RTPSMessageGroup&, eprosima::fastrtps::rtps::LocatorSelectorSender&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#10 0x0000fffff188c204 in ?? () from /opt/ros/iron/lib/libfastrtps.so.2.10
#11 0x0000fffff16b08d4 in eprosima::fastrtps::rtps::StatefulWriter::unsent_change_added_to_history(eprosima::fastrtps::rtps::CacheChange_t*, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#12 0x0000fffff1735d8c in eprosima::fastdds::dds::DataWriterImpl::perform_create_new_change(eprosima::fastrtps::rtps::ChangeKind_t, void*, eprosima::fastrtps::rtps::WriteParams&, eprosima::fastrtps::rtps::InstanceHandle_t const&) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#13 0x0000fffff1736694 in eprosima::fastdds::dds::DataWriterImpl::create_new_change_with_params(eprosima::fastrtps::rtps::ChangeKind_t, void*, eprosima::fastrtps::rtps::WriteParams&) ()
from /opt/ros/iron/lib/libfastrtps.so.2.10
#14 0x0000fffff1736760 in eprosima::fastdds::dds::DataWriterImpl::create_new_change(eprosima::fastrtps::rtps::ChangeKind_t, void*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#15 0x0000fffff17283ac in eprosima::fastdds::dds::DataWriterImpl::write(void*) ()
from /opt/ros/iron/lib/libfastrtps.so.2.10
#16 0x0000fffff1d6c5d8 in rmw_fastrtps_shared_cpp::__rmw_publish(char const*, rmw_publisher_s const*, void const*, rmw_publisher_allocation_s*) () from /opt/ros/iron/lib/librmw_fastrtps_shared_cpp.so
#17 0x0000fffff7e0806c in ?? () from /opt/ros/iron/lib/librcl.so
#18 0x0000ffffe45e2260 in rclcpp::Publisher<nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, std::allocator<void> >::do_inter_process_publish (msg=..., this=0xaaaaab63dfe0)
at /usr/include/c++/11/bits/shared_ptr_base.h:1295
#19 rclcpp::Publisher<nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, std::allocator<void> >::publish<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > > (msg=..., this=0xaaaaab63dfe0)
at /opt/ros/iron/include/rclcpp/rclcpp/publisher.hpp:273
Additional information
I think that the problem might be a race condition between the subscription going out of scope while the publisher is trying to push an event to it
Bug report
Required Info:
Steps to reproduce issue
This issue happens as part of a larger application, so I don't have a minimal reproducible example. I have 2 nodes in the same process. Each node has its own events-executor, spinning in a dedicated thread.
Node A periodically publishes an occupancy grid using this publisher
Node B from time to time creates a subscription to that topic, waits for a message and then lets the subscription go out of scope.
Actual behavior
I have a unit-test that uses this setup, where the
get_occupancy_grid
function is called 4-5 times within the test duration. The test fails sporadically, 1/30 times maybe, with the following backtraceAdditional information
I think that the problem might be a race condition between the subscription going out of scope while the publisher is trying to push an event to it