ros2 / rmw_fastrtps

Implementation of the ROS Middleware (rmw) Interface using eProsima's Fast RTPS.
Apache License 2.0
147 stars 116 forks source link

humble nodes throw std::bad_alloc if nodes from iron or rolling run on the same network #733

Closed christophfroehlich closed 5 months ago

christophfroehlich commented 7 months ago

I already asked at RSE without any response, so I try my luck here. Please point me to a different repository if you think this is related to a different package.

Bug report

We experienced that humble executables crash if there are any iron or rolling nodes are running in the same network: it returns std::bad_alloc without any further warning.

The problem is that the publisher/subsribers aren't working, but the nodes crashes immediately.

Any hints would be highly appreciated!

Required Info:

Steps to reproduce issue

We created a demo using the official docker containers from humble and iron and the minimal_publisher/subscriber: iron_humble_pub_sub_issue.tar.gz

If you run ./run_humble_subscriber.sh in one terminal and ./run_iron_publisher.sh in another (in that order), you will usually see the following error at the iron humble subscriber terminal:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
[ros2run]: Aborted

sometimes it does not appear, usually restarting the publisher causes the problem.

Expected behavior

No errors.

Actual behavior

Subscriber or topic list crashes with std::bad_alloc

A stack trace from the debugger gives

#0  0x00007ffff62ad265 in __cxa_begin_catch () from /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007ffff62ae4d3 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007ffff62a27ac in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fff72ade915 in ?? () from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#4  0x00007fff72adeeb7 in rmw_dds_common::msg::typesupport_fastrtps_cpp::cdr_deserialize(eprosima::fastcdr::Cdr&, rmw_dds_common::msg::NodeEntitiesInfo_<std::allocator<void> >&) ()
   from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#5  0x00007fff72adf1f7 in rmw_dds_common::msg::typesupport_fastrtps_cpp::cdr_deserialize(eprosima::fastcdr::Cdr&, rmw_dds_common::msg::ParticipantEntitiesInfo_<std::allocator<void> >&) ()
   from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#6  0x00007fff72753a39 in ?? () from /opt/ros/humble/lib/librmw_fastrtps_cpp.so
#7  0x00007fff727049b6 in rmw_fastrtps_shared_cpp::TypeSupport::deserialize(eprosima::fastrtps::rtps::SerializedPayload_t*, void*) () from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#8  0x00007fff7242f42a in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#9  0x00007fff720eced2 in eprosima::fastdds::dds::DataReaderImpl::read_or_take(eprosima::fastdds::dds::LoanableCollection&, eprosima::fastdds::dds::LoanableSequence<eprosima::fastdds::dds::SampleInfo, std::integral_constant<bool, true> >&, int, eprosima::fastrtps::rtps::InstanceHandle_t const&, unsigned short, unsigned short, unsigned short, bool, bool, bool) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#10 0x00007fff720ed07a in eprosima::fastdds::dds::DataReaderImpl::take(eprosima::fastdds::dds::LoanableCollection&, eprosima::fastdds::dds::LoanableSequence<eprosima::fastdds::dds::SampleInfo, std::integral_constant<bool, true> >&, int, unsigned short, unsigned short, unsigned short) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#11 0x00007fff726fc3e6 in rmw_fastrtps_shared_cpp::_take(char const*, rmw_subscription_s const*, void*, bool*, rmw_message_info_s*, rmw_subscription_allocation_s*) () from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#12 0x00007fff726eb19f in ?? () from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#13 0x00007ffff62dc253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007ffff7c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#15 0x00007ffff7d26a40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
christophfroehlich commented 7 months ago

:eyes: @devwrite @mamut-m

MiguelCompany commented 7 months ago

This comes from https://github.com/ros2/rmw_dds_common/pull/68 changing the format of an internal topic that maintains the node graph, so CLI tools like ros2 node list can get the distributed graph information.

PR https://github.com/ros2/rmw_fastrtps/pull/665 would ignore messages for which an exception is thrown when deserializing

christophfroehlich commented 7 months ago

Thanks @MiguelCompany for the quick reply, your PR solves my issue.

christophfroehlich commented 5 months ago

Fixed with #737