ros2 / rmw_cyclonedds

ROS 2 RMW layer for Eclipse Cyclone DDS
Apache License 2.0
108 stars 89 forks source link

Invalid read & leaks in node and publisher destruction #404

Closed ihasdapie closed 1 year ago

ihasdapie commented 1 year ago

When valgrinding some rcl tests I noticed that there are invalid reads and leaks happening in the destruction of nodes and publishers happening in CycloneDDS after the rmw_publisher_destroy, etc. calls.

From original issue: https://github.com/eclipse-cyclonedds/cyclonedds/issues/1376:

Those are actually issues in rmw_cyclonedds: it is not calling dds_delete_listener and dds_delete_qos as it is supposed to do. Hopefully I can move this issue, otherwise I'll have to close it here.

Required Info:

Valgrind output:

λ valgrind --leak-check=full --track-origins=yes ./test_publisher__rmw_cyclonedds_cpp
==706133== Memcheck, a memory error detector
==706133== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==706133== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==706133== Command: ./test_publisher__rmw_cyclonedds_cpp
==706133==
====================================================================================================================================================================================================================================================================================================
Running main() from /home/ihasdapie/osrf/service_introspection/install/gtest_vendor/src/gtest_vendor/src/gtest_main.cc
[==========] Running 18 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 7 tests from TestPublisher__rmw_cyclonedds_cpp
[ RUN      ] TestPublisher__rmw_cyclonedds_cpp.create_and_destroy
====================================================================================================================================================================================================================================================================================================
==706133== Invalid read of size 8
==706133==    at 0x40286C8: strncmp (strcmp.S:172)
==706133==    by 0x400668D: is_dst (dl-load.c:216)
==706133==    by 0x400810E: _dl_dst_count (dl-load.c:253)
==706133==    by 0x400810E: expand_dynamic_string_token (dl-load.c:395)
==706133==    by 0x40082B7: fillin_rpath.isra.0 (dl-load.c:483)
==706133==    by 0x4008602: decompose_rpath (dl-load.c:654)
==706133==    by 0x400ABF5: cache_rpath (dl-load.c:696)
==706133==    by 0x400ABF5: cache_rpath (dl-load.c:677)
==706133==    by 0x400ABF5: _dl_map_object (dl-load.c:2165)
==706133==    by 0x4003494: openaux (dl-deps.c:64)
==706133==    by 0x54E0C27: _dl_catch_exception (dl-error-skeleton.c:208)
==706133==    by 0x4003C7B: _dl_map_object_deps (dl-deps.c:248)
==706133==    by 0x400EA0E: dl_open_worker_begin (dl-open.c:592)
==706133==    by 0x54E0C27: _dl_catch_exception (dl-error-skeleton.c:208)
==706133==    by 0x400DF99: dl_open_worker (dl-open.c:782)
==706133==  Address 0x56feb99 is 9 bytes inside a block of size 15 alloc'd
==706133==    at 0x484E899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==706133==    by 0x40271FF: malloc (rtld-malloc.h:56)
==706133==    by 0x40271FF: strdup (strdup.c:42)
==706133==    by 0x4008594: decompose_rpath (dl-load.c:629)
==706133==    by 0x400ABF5: cache_rpath (dl-load.c:696)
==706133==    by 0x400ABF5: cache_rpath (dl-load.c:677)
==706133==    by 0x400ABF5: _dl_map_object (dl-load.c:2165)
==706133==    by 0x4003494: openaux (dl-deps.c:64)
==706133==    by 0x54E0C27: _dl_catch_exception (dl-error-skeleton.c:208)
==706133==    by 0x4003C7B: _dl_map_object_deps (dl-deps.c:248)
==706133==    by 0x400EA0E: dl_open_worker_begin (dl-open.c:592)
==706133==    by 0x54E0C27: _dl_catch_exception (dl-error-skeleton.c:208)
==706133==    by 0x400DF99: dl_open_worker (dl-open.c:782)
==706133==    by 0x54E0C27: _dl_catch_exception (dl-error-skeleton.c:208)
==706133==    by 0x400E34D: _dl_open (dl-open.c:883)
==706133==
==706133== Invalid read of size 8
==706133==    at 0x40286C8: strncmp (strcmp.S:172)
==706133==    by 0x400668D: is_dst (dl-load.c:216)
==706133==    by 0x4007F79: _dl_dst_substitute (dl-load.c:295)
==706133==    by 0x40082B7: fillin_rpath.isra.0 (dl-load.c:483)
==706133==    by 0x4008602: decompose_rpath (dl-load.c:654)
==706133==    by 0x400ABF5: cache_rpath (dl-load.c:696)
==706133==    by 0x400ABF5: cache_rpath (dl-load.c:677)
==706133==    by 0x400ABF5: _dl_map_object (dl-load.c:2165)
==706133==    by 0x4003494: openaux (dl-deps.c:64)
==706133==    by 0x54E0C27: _dl_catch_exception (dl-error-skeleton.c:208)
==706133==    by 0x4003C7B: _dl_map_object_deps (dl-deps.c:248)
==706133==    by 0x400EA0E: dl_open_worker_begin (dl-open.c:592)
==706133==    by 0x54E0C27: _dl_catch_exception (dl-error-skeleton.c:208)
==706133==    by 0x400DF99: dl_open_worker (dl-open.c:782)
==706133==  Address 0x56feb99 is 9 bytes inside a block of size 15 alloc'd
==706133==    at 0x484E899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==706133==    by 0x40271FF: malloc (rtld-malloc.h:56)
==706133==    by 0x40271FF: strdup (strdup.c:42)
==706133==    by 0x4008594: decompose_rpath (dl-load.c:629)
==706133==    by 0x400ABF5: cache_rpath (dl-load.c:696)
==706133==    by 0x400ABF5: cache_rpath (dl-load.c:677)
==706133==    by 0x400ABF5: _dl_map_object (dl-load.c:2165)
==706133==    by 0x4003494: openaux (dl-deps.c:64)
==706133==    by 0x54E0C27: _dl_catch_exception (dl-error-skeleton.c:208)
==706133==    by 0x4003C7B: _dl_map_object_deps (dl-deps.c:248)
==706133==    by 0x400EA0E: dl_open_worker_begin (dl-open.c:592)
==706133==    by 0x54E0C27: _dl_catch_exception (dl-error-skeleton.c:208)
==706133==    by 0x400DF99: dl_open_worker (dl-open.c:782)
==706133==    by 0x54E0C27: _dl_catch_exception (dl-error-skeleton.c:208)
==706133==    by 0x400E34D: _dl_open (dl-open.c:883)
==706133==
====================================================================================================================================================================================================================================================================================================
[       OK ] TestPublisher__rmw_cyclonedds_cpp.create_and_destroy (3590 ms)
[ RUN      ] TestPublisher__rmw_cyclonedds_cpp.create_with_internal_errors
[       OK ] TestPublisher__rmw_cyclonedds_cpp.create_with_internal_errors (443 ms)
[ RUN      ] TestPublisher__rmw_cyclonedds_cpp.create_and_destroy_native
[       OK ] TestPublisher__rmw_cyclonedds_cpp.create_and_destroy_native (455 ms)
[ RUN      ] TestPublisher__rmw_cyclonedds_cpp.create_with_bad_arguments

>>> [rcutils|error_handling.c:108] rcutils_set_error_state()
This error state is being overwritten:

  'Type support not from this implementation. Got:
    Handle's typesupport identifier (not-a-typesupport-identifier) is not supported by this library, at /home/ihasdapie/osrf/service_introspection/src/ros2/rosidl_typesupport/rosidl_typesupport_c/src/type_support_dispatch.hpp:113
    Handle's typesupport identifier (not-a-typesupport-identifier) is not supported by this library, at /home/ihasdapie/osrf/service_introspection/src/ros2/rosidl_typesupport/rosidl_typesupport_c/src/type_support_dispatch.hpp:113
while fetching it, at /home/ihasdapie/osrf/service_introspection/src/ros2/rmw_cyclonedds/rmw_cyclonedds_cpp/src/rmw_node.cpp:1937'

with this new error message:

  'type_support is null, at /home/ihasdapie/osrf/service_introspection/src/ros2/rmw_cyclonedds/rmw_cyclonedds_cpp/src/rmw_node.cpp:2256'

rcutils_reset_error() should be called after error handling to avoid this.
<<<
[       OK ] TestPublisher__rmw_cyclonedds_cpp.create_with_bad_arguments (482 ms)
[ RUN      ] TestPublisher__rmw_cyclonedds_cpp.destroy_with_bad_arguments
[       OK ] TestPublisher__rmw_cyclonedds_cpp.destroy_with_bad_arguments (421 ms)
[ RUN      ] TestPublisher__rmw_cyclonedds_cpp.destroy_with_internal_errors
[       OK ] TestPublisher__rmw_cyclonedds_cpp.destroy_with_internal_errors (433 ms)
[ RUN      ] TestPublisher__rmw_cyclonedds_cpp.get_actual_qos_from_system_defaults
[       OK ] TestPublisher__rmw_cyclonedds_cpp.get_actual_qos_from_system_defaults (448 ms)
[----------] 7 tests from TestPublisher__rmw_cyclonedds_cpp (6289 ms total)

[----------] 10 tests from TestPublisherUse__rmw_cyclonedds_cpp
[ RUN      ] TestPublisherUse__rmw_cyclonedds_cpp.get_actual_qos_with_bad_arguments
[       OK ] TestPublisherUse__rmw_cyclonedds_cpp.get_actual_qos_with_bad_arguments (408 ms)
[ RUN      ] TestPublisherUse__rmw_cyclonedds_cpp.get_actual_qos
[       OK ] TestPublisherUse__rmw_cyclonedds_cpp.get_actual_qos (418 ms)
[ RUN      ] TestPublisherUse__rmw_cyclonedds_cpp.count_matched_subscriptions_with_bad_args
[       OK ] TestPublisherUse__rmw_cyclonedds_cpp.count_matched_subscriptions_with_bad_args (446 ms)
[ RUN      ] TestPublisherUse__rmw_cyclonedds_cpp.count_matched_subscriptions
[       OK ] TestPublisherUse__rmw_cyclonedds_cpp.count_matched_subscriptions (603 ms)
[ RUN      ] TestPublisherUse__rmw_cyclonedds_cpp.count_mismatched_subscriptions
[       OK ] TestPublisherUse__rmw_cyclonedds_cpp.count_mismatched_subscriptions (1707 ms)
[ RUN      ] TestPublisherUse__rmw_cyclonedds_cpp.publish_message_with_bad_arguments
[       OK ] TestPublisherUse__rmw_cyclonedds_cpp.publish_message_with_bad_arguments (459 ms)
[ RUN      ] TestPublisherUse__rmw_cyclonedds_cpp.publish_with_internal_errors
[       OK ] TestPublisherUse__rmw_cyclonedds_cpp.publish_with_internal_errors (420 ms)
[ RUN      ] TestPublisherUse__rmw_cyclonedds_cpp.publish_serialized_message_with_bad_arguments
[       OK ] TestPublisherUse__rmw_cyclonedds_cpp.publish_serialized_message_with_bad_arguments (436 ms)
[ RUN      ] TestPublisherUse__rmw_cyclonedds_cpp.publish_serialized_with_internal_errors
[       OK ] TestPublisherUse__rmw_cyclonedds_cpp.publish_serialized_with_internal_errors (466 ms)
[ RUN      ] TestPublisherUse__rmw_cyclonedds_cpp.wait_for_all_acked_with_best_effort
[       OK ] TestPublisherUse__rmw_cyclonedds_cpp.wait_for_all_acked_with_best_effort (455 ms)
[----------] 10 tests from TestPublisherUse__rmw_cyclonedds_cpp (5819 ms total)

[----------] 1 test from TestPublisherUseLoan__rmw_cyclonedds_cpp
[ RUN      ] TestPublisherUseLoan__rmw_cyclonedds_cpp.borrow_loaned_message_with_bad_arguments

>>> [rcutils|error_handling.c:108] rcutils_set_error_state()
This error state is being overwritten:

  'publisher argument is null, at /home/ihasdapie/osrf/service_introspection/src/ros2/rmw_cyclonedds/rmw_cyclonedds_cpp/src/rmw_node.cpp:2554'

with this new error message:

  'Handle's typesupport identifier (rosidl_typesupport_cpp) is not supported by this library, at /home/ihasdapie/osrf/service_introspection/src/ros2/rosidl_typesupport/rosidl_typesupport_cpp/src/type_support_dispatch.hpp:111'

rcutils_reset_error() should be called after error handling to avoid this.
<<<
[  SKIPPED ] TestPublisherUseLoan__rmw_cyclonedds_cpp.borrow_loaned_message_with_bad_arguments (495 ms)
[----------] 1 test from TestPublisherUseLoan__rmw_cyclonedds_cpp (496 ms total)

[----------] Global test environment tear-down
[==========] 18 tests from 3 test suites ran. (12670 ms total)
[  PASSED  ] 17 tests.
[  SKIPPED ] 1 test, listed below:
[  SKIPPED ] TestPublisherUseLoan__rmw_cyclonedds_cpp.borrow_loaned_message_with_bad_arguments
====================================================================================================================================================================================================================================================================================================
==706133==
==706133== HEAP SUMMARY:
==706133==     in use at exit: 42,876 bytes in 104 blocks
==706133==   total heap usage: 38,157 allocs, 38,053 frees, 64,398,413 bytes allocated
==706133==
==706133== 216 bytes in 1 blocks are definitely lost in loss record 14 of 25
==706133==    at 0x484E899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==706133==    by 0x5D1087E: ddsrt_malloc_s (heap.c:20)
==706133==    by 0x5D1089C: ddsrt_malloc (heap.c:26)
==706133==    by 0x5CCE3A8: dds_alloc (dds_alloc.c:24)
==706133==    by 0x5CEACD6: dds_create_listener (dds_listener.c:19)
==706133==    by 0x5AFE18D: create_cdds_publisher(int, int, rosidl_message_type_support_t const*, char const*, rmw_qos_profile_s const*) (rmw_node.cpp:2271)
==706133==    by 0x5AFE7F0: create_publisher(int, int, rosidl_message_type_support_t const*, char const*, rmw_qos_profile_s const*, rmw_publisher_options_s const*) (rmw_node.cpp:2340)
==706133==    by 0x5AFF077: rmw_create_publisher (rmw_node.cpp:2421)
==706133==    by 0x506D921: rmw_create_publisher (functions.cpp:292)
==706133==    by 0x1175CD: TestPublisher__rmw_cyclonedds_cpp_create_with_bad_arguments_Test::TestBody() (test_publisher.cpp:166)
==706133==    by 0x163C8C: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2433)
==706133==    by 0x15C4AA: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2469)
==706133==
==706133== 392 bytes in 1 blocks are definitely lost in loss record 17 of 25
==706133==    at 0x484E899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==706133==    by 0x5D1087E: ddsrt_malloc_s (heap.c:20)
==706133==    by 0x5D1089C: ddsrt_malloc (heap.c:26)
==706133==    by 0x5CE0765: dds_create_qos (dds_qos.c:55)
==706133==    by 0x5AFD692: create_readwrite_qos(rmw_qos_profile_s const*, bool) (rmw_node.cpp:1994)
==706133==    by 0x5AFE1F4: create_cdds_publisher(int, int, rosidl_message_type_support_t const*, char const*, rmw_qos_profile_s const*) (rmw_node.cpp:2279)
==706133==    by 0x5AFE7F0: create_publisher(int, int, rosidl_message_type_support_t const*, char const*, rmw_qos_profile_s const*, rmw_publisher_options_s const*) (rmw_node.cpp:2340)
==706133==    by 0x5AFF077: rmw_create_publisher (rmw_node.cpp:2421)
==706133==    by 0x506D921: rmw_create_publisher (functions.cpp:292)
==706133==    by 0x1175CD: TestPublisher__rmw_cyclonedds_cpp_create_with_bad_arguments_Test::TestBody() (test_publisher.cpp:166)
==706133==    by 0x163C8C: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2433)
==706133==    by 0x15C4AA: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2469)
==706133==
==706133== LEAK SUMMARY:
==706133==    definitely lost: 608 bytes in 2 blocks
==706133==    indirectly lost: 0 bytes in 0 blocks
==706133==      possibly lost: 0 bytes in 0 blocks
==706133==    still reachable: 42,268 bytes in 102 blocks
==706133==         suppressed: 0 bytes in 0 blocks
==706133== Reachable blocks (those to which a pointer was found) are not shown.
==706133== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==706133==
==706133== For lists of detected and suppressed errors, rerun with: -s
==706133== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
clalancette commented 1 year ago

I just reran this test on the latest rolling, and I can't see these errors anymore. I'm not totally sure what fixed it, but I'm going to close this out for now.