Open hanyin-intel opened 3 years ago
The OS distribution I'm working on is Ubuntu 20.04 and the GCC version is 9.3.0. You can get all the necessary files from this repo and also get the log output in the repo root directory. The command we use to build the package is colcon build --packages-select simple_test --cmake-args "-DUSE_SANITIZER=ON"
and the command-line to run it is ros2 run simple_test simple_test
.
I took a bit of a look at this.
The problem is in the order of destruction between message_filter::Subscriber
and message_filter::Synchronizer
. If you have the Synchronizer
before the Subscriber
in the class members (as you do above), then the Synchronizer
gets constructed before Subscriber
, but the Subscriber
gets destroyed before the Synchronizer
. That ends up causing the problem, because the destruction of the Synchronizer
tries to unsubscribe from everything attached to it. Since the Subscriber
has already been destroyed, you get the use-after-free.
If you reverse the order of the Synchronizer
and the Subscriber
(so that the Synchronizer
comes last in the class member variables), then the problem goes away.
That said, this is pretty subtle and pretty fragile. It would be best if the Synchronizer
took a reference to the Subscribers
that are being attached to it, so that they would not be destroyed while the Synchronizer
still needs them. That would require a change in API from:
Synchronizer::connectInput(F0 & f0, F1 & f1)
to something like:
Synchronizer::connectInput(std::shared_ptr<F0> f0, std::shared_ptr<F1> f1)
Another alternative would be to have Synchronizer
register some kind of "deletion" callback with the Subscriber
class, which the Subscriber
class would call when it is being destroyed. This callback would then delete the Subscriber
right then from the Synchronizer
.
There may be other ways we could go here. @ros2/team thoughts?
The reference counted pointer is likely the cleanest. We just need to make sure it's not too easy to make a loop of dependencies that will block destruction.
Hi @clalancette, thanks for your reply, the heap-use-after-free problem goes away after I reverse the order but a new memory leak problem is generated. Here is what I got:
Direct leak of 56 byte(s) in 1 object(s) allocated from:
#0 0x7faa393cebc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7faa390bc892 (/opt/ros/foxy/lib/librcl.so+0x15892)
#2 0x7faa391cfb9a in rclcpp::graph_listener::GraphListener::GraphListener(std::shared_ptr<rclcpp::Context>) (/opt/ros/foxy/lib/librclcpp.so+0xe4b9a)
#3 0x7faa391f3cbb in std::shared_ptr<rclcpp::graph_listener::GraphListener> rclcpp::Context::get_sub_context<rclcpp::graph_listener::GraphListener, std::shared_ptr<rclcpp::Context> >(std::shared_ptr<rclcpp::Context>&&) (/opt/ros/foxy/lib/librclcpp.so+0x108cbb)
And I see your discussion on this related issue, so is that means I need to re-build rclcpp from source on foxy branch to fix this new problem?
I'm actually going to leave this open, since we do need to address the connecting inputs problem.
Hi,
I'm working on ROS2 with Foxy version and in my project I find memory leak in
message_filters::Synchronizer
when testing with Sanitizer tool. So I write a small test case, then find that when shutting down the node, there is 8 bytes heap-use-after-free issue inmessage_filters::Signal1<sensor_msgs::msg::Image_<std::allocator<void> > >::removeCallback
. Here is the code.Regards, Han