ros2 / message_filters

BSD 3-Clause "New" or "Revised" License
76 stars 66 forks source link

Memory leak: message_filters::Synchronizer heap-use-after-free #66

Open hanyin-intel opened 3 years ago

hanyin-intel commented 3 years ago

Hi,

I'm working on ROS2 with Foxy version and in my project I find memory leak in message_filters::Synchronizer when testing with Sanitizer tool. So I write a small test case, then find that when shutting down the node, there is 8 bytes heap-use-after-free issue in message_filters::Signal1<sensor_msgs::msg::Image_<std::allocator<void> > >::removeCallback. Here is the code.

#include <signal.h>
#include <thread>
#include <rclcpp/rclcpp.hpp>
#include <message_filters/subscriber.h>
#include <message_filters/sync_policies/approximate_time.h>
#include <message_filters/time_synchronizer.h>
#include <sensor_msgs/msg/image.hpp>

class SLAMNode: public rclcpp::Node
{
public:
    SLAMNode(): rclcpp::Node("test"), sync_policy_(10),
    image_sync_(sync_policy_) {
        auto subscribe_image_topic = [this](ImageSubscriberFilter &sub, std::string topic, const std::string = "raw") {
        sub.subscribe(this, topic, rclcpp::SensorDataQoS().get_rmw_qos_profile());
    };
        subscribe_image_topic(sub_image_, "image");
        subscribe_image_topic(sub_depth_, "depth");
        image_sync_.connectInput(sub_image_, sub_depth_);
        image_sync_.registerCallback(
            std::bind(&SLAMNode::rgbd_callback, this, std::placeholders::_1, std::placeholders::_2));
    };
    ~SLAMNode() {};

    void rgbd_callback(const sensor_msgs::msg::Image::ConstSharedPtr msgRGB,
                       const sensor_msgs::msg::Image::ConstSharedPtr msgD) {
        std::cout << "Callback " << std::endl;
    }

private:
    typedef message_filters::Subscriber<sensor_msgs::msg::Image> ImageSubscriberFilter;
    typedef message_filters::sync_policies::ApproximateTime<sensor_msgs::msg::Image, sensor_msgs::msg::Image> sync_policy;
    sync_policy sync_policy_;
    message_filters::Synchronizer<sync_policy> image_sync_;
    ImageSubscriberFilter sub_image_, sub_depth_, sub_right_image_, sub_mono_;
};

int main(int argc, char **argv)
{
    rclcpp::init(argc, argv);
    std::shared_ptr<SLAMNode> node = std::make_shared<SLAMNode>();

    rclcpp::spin(node);
    rclcpp::shutdown();

    return EXIT_SUCCESS;
}

Regards, Han

hanyin-intel commented 3 years ago

The OS distribution I'm working on is Ubuntu 20.04 and the GCC version is 9.3.0. You can get all the necessary files from this repo and also get the log output in the repo root directory. The command we use to build the package is colcon build --packages-select simple_test --cmake-args "-DUSE_SANITIZER=ON" and the command-line to run it is ros2 run simple_test simple_test.

clalancette commented 3 years ago

I took a bit of a look at this.

The problem is in the order of destruction between message_filter::Subscriber and message_filter::Synchronizer. If you have the Synchronizer before the Subscriber in the class members (as you do above), then the Synchronizer gets constructed before Subscriber, but the Subscriber gets destroyed before the Synchronizer. That ends up causing the problem, because the destruction of the Synchronizer tries to unsubscribe from everything attached to it. Since the Subscriber has already been destroyed, you get the use-after-free.

If you reverse the order of the Synchronizer and the Subscriber (so that the Synchronizer comes last in the class member variables), then the problem goes away.

That said, this is pretty subtle and pretty fragile. It would be best if the Synchronizer took a reference to the Subscribers that are being attached to it, so that they would not be destroyed while the Synchronizer still needs them. That would require a change in API from:

Synchronizer::connectInput(F0 & f0, F1 & f1)

to something like:

Synchronizer::connectInput(std::shared_ptr<F0> f0, std::shared_ptr<F1> f1)

Another alternative would be to have Synchronizer register some kind of "deletion" callback with the Subscriber class, which the Subscriber class would call when it is being destroyed. This callback would then delete the Subscriber right then from the Synchronizer.

There may be other ways we could go here. @ros2/team thoughts?

tfoote commented 3 years ago

The reference counted pointer is likely the cleanest. We just need to make sure it's not too easy to make a loop of dependencies that will block destruction.

hanyin-intel commented 3 years ago

Hi @clalancette, thanks for your reply, the heap-use-after-free problem goes away after I reverse the order but a new memory leak problem is generated. Here is what I got:

Direct leak of 56 byte(s) in 1 object(s) allocated from:
    #0 0x7faa393cebc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
    #1 0x7faa390bc892  (/opt/ros/foxy/lib/librcl.so+0x15892)
    #2 0x7faa391cfb9a in rclcpp::graph_listener::GraphListener::GraphListener(std::shared_ptr<rclcpp::Context>) (/opt/ros/foxy/lib/librclcpp.so+0xe4b9a)
    #3 0x7faa391f3cbb in std::shared_ptr<rclcpp::graph_listener::GraphListener> rclcpp::Context::get_sub_context<rclcpp::graph_listener::GraphListener, std::shared_ptr<rclcpp::Context> >(std::shared_ptr<rclcpp::Context>&&) (/opt/ros/foxy/lib/librclcpp.so+0x108cbb)

And I see your discussion on this related issue, so is that means I need to re-build rclcpp from source on foxy branch to fix this new problem?

clalancette commented 2 years ago

I'm actually going to leave this open, since we do need to address the connecting inputs problem.