ros2 / rmw_fastrtps

Implementation of the ROS Middleware (rmw) Interface using eProsima's Fast RTPS.
Apache License 2.0
157 stars 117 forks source link

Assertion max_num_payloads failed. #597

Closed fujitatomoya closed 2 years ago

fujitatomoya commented 2 years ago

Bug report

Required Info:

Steps to reproduce issue

# ros2 run demo_nodes_cpp add_two_ints_server
# cat test.sh
#!/bin/bash

for i in {1..15}
do
    ros2 run demo_nodes_cpp add_two_ints_client &
done

# ./test.sh

Expected behavior

test.sh succeeds w/o error

Actual behavior

assertion failture

add_two_ints_client: /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/history/TopicPayloadPool.cpp:263: bool eprosima::fastrtps::rtps::TopicPayloadPool::shrink(uint32_t): Assertion `payload_pool_allocated_size() - payload_pool_available_size() <= max_num_payloads' failed.
[ros2run]: Aborted

Additional information

core stack trace with debug build.

gdb ./build/demo_nodes_cpp/add_two_ints_client core ```bash # gdb ./build/demo_nodes_cpp/add_two_ints_client core GNU gdb (Ubuntu 12.0.90-0ubuntu1) 12.0.90 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./build/demo_nodes_cpp/add_two_ints_client... warning: Can't open file /dev/shm/fastrtps_af09ef5eb9ede059 (deleted) during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_c9b2ee5ff4b8fa25 (deleted) during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_5f0bb225d45b0fdf (deleted) during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_d66eafdba19091bc (deleted) during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_a51cd0d37aaba72d (deleted) during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_f4c3a4925f993384 during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_e68ceed461f83b22 during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_eca22d0070cbd289 (deleted) during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_3a91439301099e47 (deleted) during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_d5e06e52d608c5c2 (deleted) during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_8c7b208485a4ac89 (deleted) during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_f294cc32dd874b12 (deleted) during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_eb288bbfba7175ec during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_e2b261606980b9a9 during file-backed mapping note processing warning: Can't open file /dev/shm/fastrtps_3b5fa345288a27be during file-backed mapping note processing [New LWP 30648] [New LWP 30705] [New LWP 30694] [New LWP 30758] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/root/ros2_ws/colcon_ws/install/demo_nodes_cpp/lib/demo_nodes_cpp/add_two_ints_'. Program terminated with signal SIGABRT, Aborted. #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139980317795648) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. [Current thread is 1 (Thread 0x7f4fb51e2540 (LWP 30648))] (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139980317795648) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=139980317795648) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=139980317795648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007f4fb54de476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007f4fb54c47f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007f4fb54c471b in __assert_fail_base (fmt=0x7f4fb5679150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f4fb4c051f0 "payload_pool_allocated_size() - payload_pool_available_size() <= max_num_payloads", file=0x7f4fb4c04d88 "/root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/history/TopicPayloadPool.cpp", line=263, function=) at ./assert/assert.c:92 #6 0x00007f4fb54d5e96 in __GI___assert_fail ( assertion=0x7f4fb4c051f0 "payload_pool_allocated_size() - payload_pool_available_size() <= max_num_payloads", file=0x7f4fb4c04d88 "/root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/history/TopicPayloadPool.cpp", line=263, function=0x7f4fb4c051a8 "bool eprosima::fastrtps::rtps::TopicPayloadPool::shrink(uint32_t)") at ./assert/assert.c:101 #7 0x00007f4fb458b859 in eprosima::fastrtps::rtps::TopicPayloadPool::shrink (this=0x55bef4a660a0, max_num_payloads=0) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/history/TopicPayloadPool.cpp:263 #8 0x00007f4fb458b1e0 in eprosima::fastrtps::rtps::TopicPayloadPool::release_history (this=0x55bef4a660a0, config=...) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/history/TopicPayloadPool.cpp:174 #9 0x00007f4fb458c961 in eprosima::fastrtps::rtps::PreallocatedReallocTopicPayloadPool::release_history (this=0x55bef4a660a0, config=..., is_reader=false) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/history/./TopicPayloadPool_impl/PreallocatedWithRealloc.hpp:71 #10 0x00007f4fb458eb5c in eprosima::fastrtps::rtps::detail::TopicPayloadPoolProxy::release_history (this=0x55bef4a66040, config=..., is_reader=false) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/history/TopicPayloadPoolRegistry_impl/TopicPayloadPoolProxy.hpp:90 #11 0x00007f4fb495fd62 in eprosima::fastrtps::rtps::EDPUtils::release_payload_pool ( pool=std::shared_ptr (use count 1, weak count 1) = {...}, history_attr=..., is_reader=false) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/builtin/discovery/endpoint/EDPUtils.hpp:70 #12 0x00007f4fb49593aa in eprosima::fastrtps::rtps::delete_writer (participant=0x55bef49d9df0, writer_pair={...}, pool=std::shared_ptr (use count 1, weak count 1) = {...}) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/builtin/discovery/endpoint/EDPSimple.cpp:83 #13 0x00007f4fb49596ef in eprosima::fastrtps::rtps::EDPSimple::~EDPSimple (this=0x55bef4a63940, __in_chrg=) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/builtin/discovery/endpoint/EDPSimple.cpp:119 #14 0x00007f4fb4959820 in eprosima::fastrtps::rtps::EDPSimple::~EDPSimple (this=0x55bef4a63940, __in_chrg=) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/builtin/discovery/endpoint/EDPSimple.cpp:131 #15 0x00007f4fb493084f in eprosima::fastrtps::rtps::PDP::~PDP (this=0x55bef4a58fb0, __in_chrg=) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/builtin/discovery/participant/PDP.cpp:130 #16 0x00007f4fb49442fc in eprosima::fastrtps::rtps::PDPSimple::~PDPSimple (this=0x55bef4a58fb0, __in_chrg=) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/builtin/discovery/participant/PDPSimple.cpp:68 #17 0x00007f4fb494431c in eprosima::fastrtps::rtps::PDPSimple::~PDPSimple (this=0x55bef4a58fb0, __in_chrg=) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/builtin/discovery/participant/PDPSimple.cpp:68 #18 0x00007f4fb492a53d in eprosima::fastrtps::rtps::BuiltinProtocols::~BuiltinProtocols (this=0x55bef49e1ac0, __in_chrg=) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/builtin/BuiltinProtocols.cpp:69 #19 0x00007f4fb492a5b4 in eprosima::fastrtps::rtps::BuiltinProtocols::~BuiltinProtocols (this=0x55bef49e1ac0, __in_chrg=) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/builtin/BuiltinProtocols.cpp:71 #20 0x00007f4fb4643ea4 in eprosima::fastrtps::rtps::RTPSParticipantImpl::disable (this=0x55bef49d9df0) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/participant/RTPSParticipantImpl.cpp:479 #21 0x00007f4fb465dfeb in eprosima::fastrtps::rtps::RTPSDomain::removeRTPSParticipant (p=0x55bef49d9dd0) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/RTPSDomain.cpp:243 #22 0x00007f4fb4732d73 in eprosima::fastdds::dds::DomainParticipantImpl::~DomainParticipantImpl (this=0x55bef49d8ca0, --Type for more, q to quit, c to continue without paging-- __in_chrg=) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/domain/DomainParticipantImpl.cpp:269 #23 0x00007f4fb4732f72 in eprosima::fastdds::dds::DomainParticipantImpl::~DomainParticipantImpl (this=0x55bef49d8ca0, __in_chrg=) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/domain/DomainParticipantImpl.cpp:283 #24 0x00007f4fb4726586 in eprosima::fastdds::dds::DomainParticipantFactory::delete_participant ( this=0x7f4fb4f77de0 , part=0x55bef49d8bf0) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/domain/DomainParticipantFactory.cpp:193 #25 0x00007f4fb50869da in rmw_fastrtps_shared_cpp::destroy_participant (participant_info=0x55bef49d70f0) at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/participant.cpp:345 #26 0x00007f4fb507ccb9 in rmw_fastrtps_shared_cpp::decrement_context_impl_ref_count (context=0x55bef49d6890) at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/init_rmw_context_impl.cpp:86 #27 0x00007f4fb5182031 in rmw_destroy_node (node=0x55bef4b11bd0) at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_node.cpp:99 #28 0x00007f4fb544876a in rmw_destroy_node (v1=0x55bef4b11bd0) at /root/ros2_ws/colcon_ws/src/ros2/rmw_implementation/rmw_implementation/src/functions.cpp:269 #29 0x00007f4fb596196e in rcl_node_fini (node=0x55bef49d8140) at /root/ros2_ws/colcon_ws/src/ros2/rcl/rcl/src/rcl/node.c:384 #30 0x00007f4fb635c1ee in operator() (__closure=0x55bef4b08750, node=0x55bef49d8140) at /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/node_interfaces/node_base.cpp:123 #31 0x00007f4fb635e096 in std::_Sp_counted_deleter, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose(void) (this=0x55bef4b08740) at /usr/include/c++/11/bits/shared_ptr_base.h:442 #32 0x000055bef2c0777b in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55bef4b08740) at /usr/include/c++/11/bits/shared_ptr_base.h:168 #33 0x000055bef2c05c39 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x55bef49d8018, __in_chrg=) at /usr/include/c++/11/bits/shared_ptr_base.h:705 #34 0x00007f4fb62e2db8 in std::__shared_ptr::~__shared_ptr (this=0x55bef49d8010, __in_chrg=) at /usr/include/c++/11/bits/shared_ptr_base.h:1154 #35 0x00007f4fb62e2dd8 in std::shared_ptr::~shared_ptr (this=0x55bef49d8010, __in_chrg=) at /usr/include/c++/11/bits/shared_ptr.h:122 #36 0x00007f4fb635ce4c in rclcpp::node_interfaces::NodeBase::~NodeBase (this=0x55bef49d7fe0, __in_chrg=) at /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/node_interfaces/node_base.cpp:146 #37 0x00007f4fb635ceac in rclcpp::node_interfaces::NodeBase::~NodeBase (this=0x55bef49d7fe0, __in_chrg=) at /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/node_interfaces/node_base.cpp:146 #38 0x00007f4fb635bf3a in std::_Sp_counted_ptr::_M_dispose ( this=0x55bef4b33060) at /usr/include/c++/11/bits/shared_ptr_base.h:348 #39 0x000055bef2c0777b in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55bef4b33060) at /usr/include/c++/11/bits/shared_ptr_base.h:168 #40 0x000055bef2c05c39 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x55bef49d7a70, __in_chrg=) at /usr/include/c++/11/bits/shared_ptr_base.h:705 #41 0x000055bef2c053da in std::__shared_ptr::~__shared_ptr (this=0x55bef49d7a68, __in_chrg=) at /usr/include/c++/11/bits/shared_ptr_base.h:1154 #42 0x000055bef2c05424 in std::shared_ptr::~shared_ptr (this=0x55bef49d7a68, __in_chrg=) at /usr/include/c++/11/bits/shared_ptr.h:122 #43 0x00007f4fb6350f46 in rclcpp::Node::~Node (this=0x55bef49d7a50, __in_chrg=) at /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/node.cpp:279 #44 0x000055bef2c13e89 in __gnu_cxx::new_allocator::destroy (this=0x55bef49d7a50, __p=0x55bef49d7a50) at /usr/include/c++/11/ext/new_allocator.h:168 #45 0x000055bef2c13495 in std::allocator_traits >::destroy (__a=..., __p=0x55bef49d7a50) at /usr/include/c++/11/bits/alloc_traits.h:535 --Type for more, q to quit, c to continue without paging-- #46 0x000055bef2c126e9 in std::_Sp_counted_ptr_inplace, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x55bef49d7a40) at /usr/include/c++/11/bits/shared_ptr_base.h:528 #47 0x000055bef2c0777b in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55bef49d7a40) at /usr/include/c++/11/bits/shared_ptr_base.h:168 #48 0x000055bef2c05c39 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7ffd77311c18, __in_chrg=) at /usr/include/c++/11/bits/shared_ptr_base.h:705 #49 0x000055bef2c058dc in std::__shared_ptr::~__shared_ptr (this=0x7ffd77311c10, __in_chrg=) at /usr/include/c++/11/bits/shared_ptr_base.h:1154 #50 0x000055bef2c05926 in std::shared_ptr::~shared_ptr (this=0x7ffd77311c10, __in_chrg=) at /usr/include/c++/11/bits/shared_ptr.h:122 #51 0x000055bef2c03a0b in main (argc=1, argv=0x7ffd77312778) at /root/ros2_ws/colcon_ws/src/ros2/demos/demo_nodes_cpp/src/services/add_two_ints_client.cpp:75 ```
fujitatomoya commented 2 years ago

@MiguelCompany i am not sure if this is bug, could you take a look? CC: @Barry-Xu-2018 @iuhilnehc-ynos

MiguelCompany commented 2 years ago

@fujitatomoya Sorry for the late answer (we were on Easter holidays here). This seems to have been previously reported at eProsima/Fast-DDS#2533, but we didn't have time to check it yet.

We will reproduce and debug it this week.

clalancette commented 2 years ago

@MiguelCompany Any thoughts on what might be going on here?

MiguelCompany commented 2 years ago

Any thoughts on what might be going on here?

Not much yet. I could reproduce it, but not in a deterministic way (so nothing that could be turned into a regression test). I've been debugging it a bit, but didn't get to a conclusion yet.

VladimirVassiliev commented 2 years ago

@MiguelCompany Should the TopicPayloadPool::shrink() function be something like this?

bool TopicPayloadPool::shrink (
        uint32_t max_num_payloads)
{
    assert(max_num_payloads == 0 || payload_pool_allocated_size() - payload_pool_available_size() <= max_num_payloads);

    while (max_num_payloads < all_payloads_.size())
    {
        if (free_payloads_.empty())
        {
            // This may happen if max_num_payloads is zero.
            break;
        }

        PayloadNode* payload = free_payloads_.back();
        free_payloads_.pop_back();

        // Find data in allPayloads, remove element, then delete it
        all_payloads_.at(payload->data_index()) = all_payloads_.back();
        all_payloads_.back()->data_index(payload->data_index());
        all_payloads_.pop_back();
        delete payload;
    }

    return true;
}

I understand that zero is a valid value of max_num_payloads. Is it correct?

MiguelCompany commented 2 years ago

@fujitatomoya @clalancette This should have been fixed by eProsima/Fast-DDS#2853, backport to 2.6.x pending CI on eProsima/Fast-DDS#2861

fujitatomoya commented 2 years ago

@MiguelCompany

add_two_ints_client: /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/rtps/history/TopicPayloadPool.cpp:263: bool eprosima::fastrtps::rtps::TopicPayloadPool::shrink(uint32_t): Assertion `payload_pool_allocated_size() - payload_pool_available_size() <= max_num_payloads' failed.
[ros2run]: Aborted

above problem has been solved, i cannot reproduce that assertion.

but, we can see service not available message instead.

# ./test.sh
root@tomoyafujita:~/ros2_ws/colcon_ws# [INFO] [1658999884.962398783] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999884.971446955] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999884.975172233] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999884.980538959] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999884.987139210] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999884.988729905] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999884.990040116] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999884.998537695] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999885.001958420] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999885.061093777] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999885.061093287] [add_two_ints_client]: Result of add_two_ints: 5
[INFO] [1658999885.974434720] [add_two_ints_client]: service not available, waiting again...
[INFO] [1658999885.976093427] [add_two_ints_client]: service not available, waiting again...
[INFO] [1658999885.979039142] [add_two_ints_client]: service not available, waiting again...
[INFO] [1658999885.985281537] [add_two_ints_client]: service not available, waiting again...
MiguelCompany commented 2 years ago

@fujitatomoya This was already fixed and backported. Do you think we can close it?