Closed zygfrydw closed 4 months ago
In my opinion this subcription on parameter events is not necessary in this internal node, however I do not see any option to disable this.
So we do actually need to react to parameter events in TimeSource. The reason is so that we can do the right thing when the user enables or disables simulation time via the use_sim_time
parameter.
However, we don't exactly need to subscribe to it anymore. Since Iron, we have the ability to get a callback after a parameter has been updated, using the post_set_parameters_callback
(like in https://github.com/ros2/demos/blob/36a4c0bd6ae0531ad3096806c41efd7062a1cb31/demo_nodes_cpp/src/parameters/set_parameters_callback.cpp#L86-L106). Additionally, using the post_set_parameters_callback
will be more performant than the subscription.
So my suggestion is that instead we open a PR to rclcpp
, replacing https://github.com/ros2/rclcpp/blob/9284d7cefa9e37ce3b6926a44e6177d6e4a4b62a/rclcpp/src/rclcpp/time_source.cpp#L279-L288 with an add_on_post_set_parameters_callback
. That will solve this leak.
We can also consider taking this PR, for the reasons you describe, but I think that is a separate issue. @zygfrydw are you willing to open that PR to rclcpp
?
I am not exactly sure what it changes to replace the parameter_subscription_
with post_set_parameters_callback
in terms of this pull request.
In the TimeSource::NodeState::attachNode
there is already a subscription to on_set_parameters_callback
:
on_set_parameters_callback_ = node_parameters_->add_on_set_parameters_callback(
std::bind(&TimeSource::NodeState::on_set_parameters, this, std::placeholders::_1));
post_set_parameters_callback = rclcpp::AsyncParametersClient::on_parameter_event(
node_topics_,
[this](std::shared_ptr<const rcl_interfaces::msg::ParameterEvent> event) {
if (node_base_ != nullptr) {
this->on_parameter_event(event);
}
});
I do not understand why both on_set_parameters_callback
and parameter_subscription_
are necessary and why changing the parameter_subscription_
to post_set_parameters_callback
would change anything.
In my opinion, it is not correct to spin only the /tf
and /tf_static
methods of the internal Transform Listener node.
I am not aware of other elements of node routines that require spinning, so removing post_set_parameters_callback
may be sufficient for now. However in my opinion it is a fragile fix because if in the future someone adds some subscription to Node the issue will come back.
We are currently locked on the ROS Humble version so we hope you merge this pull request to ROS Humble as well.
Hello,
We ran into what looks like this issue and discovered this PR. During our investigation we independently discovered that all of our memory-leaking nodes were ones that instantiated a TransformListener
in the way you described in your example. We're testing the fix here now, but I wanted to drop by to show the effects of the memory leak, as we did quite some testing on multiple robots in our fleet, and in simulation, to get a picture of its severity, and what triggers it.
Here is a graph of time vs. memory usage of every node in our system, gathered over 548 minutes (9.1 hours) of robot idle time. The lines with an incline are nodes with TransformListener
instantiations that leaked memory. The flat lines are the rest of our system.
From eyeballing these results, it looks to me like every TransformListener
leaks about ~20mb
of memory over that ~9hr period, and nodes with multiple instantiations of a TransformListener
leak harder than those with just one, or none. This fits your explanation of a never-decreasing, immortal message queue per-instance of that object. In our case, this overall accumulation led to an extra ~1.2gb
of memory usage by our system. This is pretty severe for those of us doing long-term robotics tasks.
Just wanted to put here some experimental support for your diagnosis. Not sure if the fix here helps us yet, but we are very suspicious that this object is the one that is leaky, and we will report back.
I also created a small repository to run experiments. I spawn a dummy node that doesn't do anything, just advertise a parameter which I later use to send a big string payload. After putting the payload into the parameter, I started seeing memory leaks. The confusing part is that with/without TransformListener
I still observe leaks, and with 2 different DDS implementations.
Indeed if a tf2_ros::TransformListener
object is part of the nodes the memory leak is notable
class LeakyNode : public rclcpp::Node {
public:
explicit LeakyNode(const std::string &name) : Node(name) {
this->declare_parameter("my_parameter", "world");
}
};
class LeakyNode : public rclcpp::Node {
public:
explicit LeakyNode(const std::string &name) : Node(name) {
this->declare_parameter("my_parameter", "world");
tf_buffer = std::make_unique<tf2_ros::Buffer>(this->get_clock());
tf_listener = std::make_unique<tf2_ros::TransformListener>(*tf_buffer);
}
std::unique_ptr<tf2_ros::TransformListener> tf_listener;
std::unique_ptr<tf2_ros::Buffer> tf_buffer;
};
I don't have anyone yet :cry: So far all the experiments I have done are in some way contradictory (common in memory leak hunting). Any leads on how to nail down this problem would be highly appreciated
@zygfrydw Was this issue fixed with the @nachovizzo PR https://github.com/ros2/geometry2/pull/636 ?
Feel free to reopen it again, if the problem is still there
There is a memory / message leak when using
TransformListener
constructed with internal node.When
TransformListener
is constructed in the following way:There is a memory leake due to messages published to
/parameter_events
which is not served in the internal node. We have observed it when using CycloneDDS with shared memory (Iceoryx) and we have created a repo to reproduce this error.The reason for this issue is that, when in
TransformListener
is created dedicated thread, the subcriptions for/tf
and/tf_static
topic are added to callback group. When internal node is created, the node creates subcription on topic/parameter_events
. However this subscription is created in default callback group which is never spinned by the executor (the executor is told explicitly to run only the callback group with/tf
and/tf_static
). The topic/parameter_events
is created with QOS with history of 1000 messages. Therefore not served messages live forever in/parameter_events
for eachTransformListener
. In the application with default CycloneDDS this results in small, constant memory leak in each instance ofTransformListener
, however when messages are taken from pool like when using Shared Memory, the memory pool is exausted quickly.The
/parameter_events
is created inrclcpp::TimeSource::NodeState
inrclcpp/src/rclcpp/time_source.cpp:270
.In my opinion this subcription on parameter events is not necessary in this internal node, however I do not see any option to disable this. Additionaly, the problem will come back if someone adds other subscriptions in Node in future, hance I perfer current solution with adding default callback group to executor.
Other possible solution is: