micro-ROS / micro-ROS-Agent

ROS 2 package using Micro XRCE-DDS Agent.
Apache License 2.0
104 stars 62 forks source link

terminate called after throwing an instance of 'std::bad_alloc' #214

Open Ryanf55 opened 11 months ago

Ryanf55 commented 11 months ago

Describe the bug When running the MicroROS agent, it periodically crashes with std::bad_alloc.

To Reproduce Steps to reproduce the behavior:

  1. Clone ArduPilot on my branch: https://github.com/Ryanf55/ardupilot/tree/dds-plane-goal-interface
  2. Set up the ArduPilot build environment: https://ardupilot.org/dev/docs/building-the-code.html
  3. Follow the PR instructions to run ardupilot and the micro ROS agent
  4. Wait 2-3 seconds after initialization and observe the runtime crash

Expected behaviour

The agent runs reliably without an allocation error.

System information (please complete the following information):

Additional context

Here's the debug logs at verbosity 6 while running under gdb:

[1702444844.833881] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0xAAAABBBB, len: 36, data: 
0000: 81 80 17 00 07 01 0C 00 00 50 00 05 01 00 00 00 20 5D 04 33 07 01 0C 00 00 51 00 75 01 00 00 00
0020: 20 5D 04 33
[1702444844.833918] debug    | DataWriter.cpp     | write                    | [** <<DDS>> **]        | client_key: 0x00000000, len: 8, data: 
0000: 01 00 00 00 20 5D 04 33
[1702444844.833930] debug    | DataWriter.cpp     | write                    | [** <<DDS>> **]        | client_key: 0x00000007, len: 8, data: 
0000: 01 00 00 00 20 5D 04 33
[1702444844.833981] debug    | UDPv4AgentLinux.cpp | send_message             | [** <<UDP>> **]        | client_key: 0xAAAABBBB, len: 13, data: 
0000: 81 00 00 00 0A 01 05 00 18 00 00 00 80

Thread 19 "micro_ros_agent" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffdcff9640 (LWP 130694)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140736901125696) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140736901125696) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140736901125696) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140736901125696, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff6c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff6c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff70a2b9e in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff70ae20c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff70ae277 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff70ae4d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff70a27ac in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff7ec8915 in ?? () from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#11 0x00007ffff7ec8eb7 in rmw_dds_common::msg::typesupport_fastrtps_cpp::cdr_deserialize(eprosima::fastcdr::Cdr&, rmw_dds_common::msg::NodeEntitiesInfo_<std::allocator<void> >&) ()
   from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#12 0x00007ffff7ec91f7 in rmw_dds_common::msg::typesupport_fastrtps_cpp::cdr_deserialize(eprosima::fastcdr::Cdr&, rmw_dds_common::msg::ParticipantEntitiesInfo_<std::allocator<void> >&) ()
   from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#13 0x00005555555a02ed in uros::agent::graph_manager::ParticipantEntitiesInfoTypeSupport::deserialize(eprosima::fastrtps::rtps::SerializedPayload_t*, void*) ()
#14 0x00007ffff7a2f42a in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#15 0x00007ffff76ed47b in eprosima::fastdds::dds::DataReaderImpl::read_or_take_next_sample(void*, eprosima::fastdds::dds::SampleInfo*, bool) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#16 0x0000555555585888 in uros::agent::graph_manager::GraphManager::update_node_entities_info() ()
#17 0x00005555555868c4 in uros::agent::graph_manager::GraphManager::DatareaderListener::on_data_available(eprosima::fastdds::dds::DataReader*) ()
#18 0x00007ffff76ee84d in eprosima::fastdds::dds::DataReaderImpl::InnerDataReaderListener::onNewCacheChangeAdded(eprosima::fastrtps::rtps::RTPSReader*, eprosima::fastrtps::rtps::CacheChange_t const*) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#19 0x00007ffff76716b4 in eprosima::fastrtps::rtps::StatefulReader::NotifyChanges(eprosima::fastrtps::rtps::WriterProxy*) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#20 0x00007ffff7671e5b in eprosima::fastrtps::rtps::StatefulReader::change_received(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::WriterProxy*, unsigned long) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#21 0x00007ffff7672381 in eprosima::fastrtps::rtps::StatefulReader::processDataMsg(eprosima::fastrtps::rtps::CacheChange_t*) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#22 0x00007ffff767ff20 in eprosima::fastrtps::rtps::MessageReceiver::process_data_message_without_security(eprosima::fastrtps::rtps::EntityId_t const&, eprosima::fastrtps::rtps::CacheChange_t&) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#23 0x00007ffff7689a7b in eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_Data(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*) const () from /opt/ros/humble/lib/libfastrtps.so.2.6
#24 0x00007ffff768b6c8 in eprosima::fastrtps::rtps::MessageReceiver::processCDRMsg(eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::CDRMessage_t*) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#25 0x00007ffff769134f in eprosima::fastrtps::rtps::ReceiverResource::OnDataReceived(unsigned char const*, unsigned int, eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::Locator_t const&) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#26 0x00007ffff78c7618 in eprosima::fastdds::rtps::SharedMemChannelResource::perform_listen_operation(eprosima::fastrtps::rtps::Locator_t) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#27 0x00007ffff78c22fb in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#28 0x00007ffff70dc253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#29 0x00007ffff6c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#30 0x00007ffff6d26660 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) Quit
(gdb) 
quit
pablogs9 commented 11 months ago

Hello @Ryanf55, this is a well-known problem and IMO not really a micro-ROS issue.

The key is that the ROS 2 ros_discovery_info topic has a significant change on its type rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_. They have reduced the size of an array from 24 to 16 for this very same topic, making it incompatible between ROS 2 distros:

Specifically, if your Humble installation receives an Iron ros_discovery_info data representation will not be compliant to the Humble deserialization, making its deserialization unpredictable a very likely to throw an exception.

In summary, this is a ROS 2 distro incompatibility issue and shall be solved if you ensure that your Humble environment does not have any interaction with an Iron environment (local or remote).

Ryanf55 commented 11 months ago

HI Pablo,

thanks for the info. Just FYI, I do not have Iron installed, and there are no other ROS 2 developers on my home network, so I don't think that's the issue. Everything is on humble.

Ardupilot targets ros2 humble only.

pablogs9 commented 11 months ago

You have the very same error that we found some weeks ago.

How are you building the micro-ROS Agent? Are any docker in your system?

Because of this line #26 0x00007ffff78c7618 in eprosima::fastdds::rtps::SharedMemChannelResource::perform_listen_operation(eprosima::fastrtps::rtps::Locator_t) () from /opt/ros/humble/lib/libfastrtps.so.2.6 it seems that the message that raises the error is in your same computer and communicating via shared memory.

Ryanf55 commented 11 months ago

We are building the micro-ROS with colcon, with the humble branch in our ROS workspace. This is all on host, no docker.

https://github.com/ArduPilot/ardupilot/blob/6515df72f0473b8982f3d25bdade74f5d9df8be3/Tools/ros2/ros2.repos#L9

Fast-dds is installed with the humble binaries.

pablogs9 commented 11 months ago

Can you provide a Dockerfile with a replicator without the Ardupilot part?

Ryanf55 commented 11 months ago

Can you provide a Dockerfile with a replicator without the Ardupilot part?

I can try. The MicroXRCE DDS Agent is heavily tied to ArduPilot right now; it may be hard to build a standalone example to reproduce.

Would it be acceptable to provide a dockerfile with ArduPilot already built and running? Then you can just run it against MicroROS on your host OS built with debug and run under GDB?

pablogs9 commented 11 months ago

That would be acceptable as far as everything runs inside a Docker.

Ryanf55 commented 11 months ago

Thanks. Can you assign this ticket to me. I can get you the info a few days.