ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
536 stars 417 forks source link

leak due to std::shared_ptr circular reference between Context and GuardCondition #2497

Closed bmartin427 closed 5 months ago

bmartin427 commented 5 months ago

Bug report

Required Info:

Steps to reproduce issue

Compile the following standalone executable and run with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp valgrind --leak-check=full:

#include <memory>

#include <rclcpp/context.hpp>
#include <rclcpp/guard_condition.hpp>

int main(int argc, char** argv) {
  auto context = std::make_shared<rclcpp::Context>();
  context->init(argc, argv);
  context->get_sub_context<rclcpp::GuardCondition>(context);
  context->shutdown("shutdown");
  return 0;
}

(Abusing GuardCondition as a 'sub-context' here is a simplified stand-in for GraphListener which is created as a sub-context by NodeGraph on behalf of Node, and itself owns a GuardCondition constructed with a std::shared_ptr to the context.)

Expected behavior

Optimally, no complaints, however there will be a couple leaks from liblttng-ust.so outside the scope of this issue.

Actual behavior

In addition to the above, this complaint:

==580480== 2,136 bytes in 1 blocks are possibly lost in loss record 42 of 44
==580480==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==580480==    by 0x691434C: ddsrt_malloc (heap.c:26)
==580480==    by 0x68CC4A9: UnknownInlinedFun (q_thread.c:79)
==580480==    by 0x68CC4A9: thread_states_init (q_thread.c:103)
==580480==    by 0x68EAB92: dds_init (dds_init.c:116)
==580480==    by 0x6901351: dds_create_guardcondition (dds_guardcond.c:45)
==580480==    by 0x67E6311: create_guard_condition() [clone .lto_priv.0] (rmw_node.cpp:4146)
==580480==    by 0x4AB1945: ??? (in /opt/ros/iron/lib/librcl.so)
==580480==    by 0x497F6BF: rclcpp::GuardCondition::GuardCondition(std::shared_ptr<rclcpp::Context>, rcl_guard_condition_options_s) (in /opt/ros/iron/lib/librclcpp.so)
==580480==    by 0x10AF2F: std::shared_ptr<rclcpp::GuardCondition> rclcpp::Context::get_sub_context<rclcpp::GuardCondition, std::shared_ptr<rclcpp::Context>&>(std::shared_ptr<rclcpp::Context>&) (in /home/bmartin/tmp)
==580480==    by 0x10A6B4: main (in /home/bmartin/tmp)

Additional information

If using fastDDS the above valgrind complaint will not appear. However I do believe the failure to shut down correctly is the fault of rclcpp and not any rmw implementation: that is, the context and the guard condition both own each other via std::shared_ptrs, so neither can ever be deleted. Possibly GuardCondition should be holding a weak pointer to the context instead?

clalancette commented 5 months ago

If using fastDDS the above valgrind complaint will not appear. However I do believe the failure to shut down correctly is the fault of rclcpp and not any rmw implementation: that is, the context and the guard condition both own each other via std::shared_ptrs, so neither can ever be deleted. Possibly GuardCondition should be holding a weak pointer to the context instead?

Yeah, I did something like this in #2400 on the rolling branch. I'm not totally convinced that is correct either, because it seems possible for one of them to go out of scope without the other, but it has been in there a while now and seems to be OK.

fujitatomoya commented 5 months ago

i verified with https://github.com/ros2/ros2/commit/4d07f58591920794676261a14c2ba9d0a7bf2746, reported complain cannot be observed in mainline.

@bmartin427 it would be appreciated if you can check and close this issue.

bmartin427 commented 5 months ago

I tested with ros-rolling-rclcpp=27.0.0-1jammy.20240216.162604 and confirmed I was not able to reproduce the problem.