micro-ROS / micro-ROS-Agent

ROS 2 package using Micro XRCE-DDS Agent.
Apache License 2.0
97 stars 51 forks source link

terminate called after throwing an instance of 'std::bad_alloc' #214

Open Ryanf55 opened 6 months ago

Ryanf55 commented 6 months ago

Describe the bug When running the MicroROS agent, it periodically crashes with std::bad_alloc.

To Reproduce Steps to reproduce the behavior:

  1. Clone ArduPilot on my branch: https://github.com/Ryanf55/ardupilot/tree/dds-plane-goal-interface
  2. Set up the ArduPilot build environment: https://ardupilot.org/dev/docs/building-the-code.html
  3. Follow the PR instructions to run ardupilot and the micro ROS agent
  4. Wait 2-3 seconds after initialization and observe the runtime crash

Expected behaviour

The agent runs reliably without an allocation error.

System information (please complete the following information):

Additional context

Here's the debug logs at verbosity 6 while running under gdb:

[1702444844.833881] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0xAAAABBBB, len: 36, data: 
0000: 81 80 17 00 07 01 0C 00 00 50 00 05 01 00 00 00 20 5D 04 33 07 01 0C 00 00 51 00 75 01 00 00 00
0020: 20 5D 04 33
[1702444844.833918] debug    | DataWriter.cpp     | write                    | [** <<DDS>> **]        | client_key: 0x00000000, len: 8, data: 
0000: 01 00 00 00 20 5D 04 33
[1702444844.833930] debug    | DataWriter.cpp     | write                    | [** <<DDS>> **]        | client_key: 0x00000007, len: 8, data: 
0000: 01 00 00 00 20 5D 04 33
[1702444844.833981] debug    | UDPv4AgentLinux.cpp | send_message             | [** <<UDP>> **]        | client_key: 0xAAAABBBB, len: 13, data: 
0000: 81 00 00 00 0A 01 05 00 18 00 00 00 80

Thread 19 "micro_ros_agent" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffdcff9640 (LWP 130694)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140736901125696) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140736901125696) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140736901125696) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140736901125696, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff6c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff6c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff70a2b9e in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff70ae20c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff70ae277 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff70ae4d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff70a27ac in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff7ec8915 in ?? () from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#11 0x00007ffff7ec8eb7 in rmw_dds_common::msg::typesupport_fastrtps_cpp::cdr_deserialize(eprosima::fastcdr::Cdr&, rmw_dds_common::msg::NodeEntitiesInfo_<std::allocator<void> >&) ()
   from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#12 0x00007ffff7ec91f7 in rmw_dds_common::msg::typesupport_fastrtps_cpp::cdr_deserialize(eprosima::fastcdr::Cdr&, rmw_dds_common::msg::ParticipantEntitiesInfo_<std::allocator<void> >&) ()
   from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#13 0x00005555555a02ed in uros::agent::graph_manager::ParticipantEntitiesInfoTypeSupport::deserialize(eprosima::fastrtps::rtps::SerializedPayload_t*, void*) ()
#14 0x00007ffff7a2f42a in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#15 0x00007ffff76ed47b in eprosima::fastdds::dds::DataReaderImpl::read_or_take_next_sample(void*, eprosima::fastdds::dds::SampleInfo*, bool) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#16 0x0000555555585888 in uros::agent::graph_manager::GraphManager::update_node_entities_info() ()
#17 0x00005555555868c4 in uros::agent::graph_manager::GraphManager::DatareaderListener::on_data_available(eprosima::fastdds::dds::DataReader*) ()
#18 0x00007ffff76ee84d in eprosima::fastdds::dds::DataReaderImpl::InnerDataReaderListener::onNewCacheChangeAdded(eprosima::fastrtps::rtps::RTPSReader*, eprosima::fastrtps::rtps::CacheChange_t const*) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#19 0x00007ffff76716b4 in eprosima::fastrtps::rtps::StatefulReader::NotifyChanges(eprosima::fastrtps::rtps::WriterProxy*) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#20 0x00007ffff7671e5b in eprosima::fastrtps::rtps::StatefulReader::change_received(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::WriterProxy*, unsigned long) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#21 0x00007ffff7672381 in eprosima::fastrtps::rtps::StatefulReader::processDataMsg(eprosima::fastrtps::rtps::CacheChange_t*) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#22 0x00007ffff767ff20 in eprosima::fastrtps::rtps::MessageReceiver::process_data_message_without_security(eprosima::fastrtps::rtps::EntityId_t const&, eprosima::fastrtps::rtps::CacheChange_t&) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#23 0x00007ffff7689a7b in eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_Data(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*) const () from /opt/ros/humble/lib/libfastrtps.so.2.6
#24 0x00007ffff768b6c8 in eprosima::fastrtps::rtps::MessageReceiver::processCDRMsg(eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::CDRMessage_t*) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#25 0x00007ffff769134f in eprosima::fastrtps::rtps::ReceiverResource::OnDataReceived(unsigned char const*, unsigned int, eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::Locator_t const&) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#26 0x00007ffff78c7618 in eprosima::fastdds::rtps::SharedMemChannelResource::perform_listen_operation(eprosima::fastrtps::rtps::Locator_t) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#27 0x00007ffff78c22fb in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#28 0x00007ffff70dc253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#29 0x00007ffff6c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#30 0x00007ffff6d26660 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) Quit
(gdb) 
quit
pablogs9 commented 6 months ago

Hello @Ryanf55, this is a well-known problem and IMO not really a micro-ROS issue.

The key is that the ROS 2 ros_discovery_info topic has a significant change on its type rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_. They have reduced the size of an array from 24 to 16 for this very same topic, making it incompatible between ROS 2 distros:

Specifically, if your Humble installation receives an Iron ros_discovery_info data representation will not be compliant to the Humble deserialization, making its deserialization unpredictable a very likely to throw an exception.

In summary, this is a ROS 2 distro incompatibility issue and shall be solved if you ensure that your Humble environment does not have any interaction with an Iron environment (local or remote).

Ryanf55 commented 6 months ago

HI Pablo,

thanks for the info. Just FYI, I do not have Iron installed, and there are no other ROS 2 developers on my home network, so I don't think that's the issue. Everything is on humble.

Ardupilot targets ros2 humble only.

pablogs9 commented 6 months ago

You have the very same error that we found some weeks ago.

How are you building the micro-ROS Agent? Are any docker in your system?

Because of this line #26 0x00007ffff78c7618 in eprosima::fastdds::rtps::SharedMemChannelResource::perform_listen_operation(eprosima::fastrtps::rtps::Locator_t) () from /opt/ros/humble/lib/libfastrtps.so.2.6 it seems that the message that raises the error is in your same computer and communicating via shared memory.

Ryanf55 commented 6 months ago

We are building the micro-ROS with colcon, with the humble branch in our ROS workspace. This is all on host, no docker.

https://github.com/ArduPilot/ardupilot/blob/6515df72f0473b8982f3d25bdade74f5d9df8be3/Tools/ros2/ros2.repos#L9

Fast-dds is installed with the humble binaries.

pablogs9 commented 6 months ago

Can you provide a Dockerfile with a replicator without the Ardupilot part?

Ryanf55 commented 6 months ago

Can you provide a Dockerfile with a replicator without the Ardupilot part?

I can try. The MicroXRCE DDS Agent is heavily tied to ArduPilot right now; it may be hard to build a standalone example to reproduce.

Would it be acceptable to provide a dockerfile with ArduPilot already built and running? Then you can just run it against MicroROS on your host OS built with debug and run under GDB?

pablogs9 commented 6 months ago

That would be acceptable as far as everything runs inside a Docker.

Ryanf55 commented 6 months ago

Thanks. Can you assign this ticket to me. I can get you the info a few days.