ros2 / rclpy

rclpy (ROS Client Library for Python)
Apache License 2.0
268 stars 221 forks source link

Memory leak in ActionServer. ```taken_data = self._handle.take_goal_request``` #1300

Open holesond opened 2 weeks ago

holesond commented 2 weeks ago

Bug report

Required Info:

Steps to reproduce issue

NOTE: The zipped fibonacci_package.zip ROS2 package contains all the files required to reproduce the issue.

Action definition Fibonacci.action is similar to https://docs.ros.org/en/jazzy/Tutorials/Intermediate/Creating-an-Action.html.

int32 order
int32[] data
---
int32[] sequence
---
int32[] partial_sequence

I adapted the action client and server from https://docs.ros.org/en/jazzy/Tutorials/Intermediate/Writing-an-Action-Server-Client/Py.html.

Action client fibonacci_action_client.py:

#!/usr/bin/env python3

import rclpy
from rclpy.action import ActionClient
from rclpy.node import Node

from fibonacci_package.action import Fibonacci

class FibonacciActionClient(Node):

    def __init__(self):
        super().__init__('fibonacci_action_client')
        self._action_client = ActionClient(self, Fibonacci, 'fibonacci')
        self.data_buffer = [int(i) for i in range(1000000)]

    def send_goal(self, order):
        goal_msg = Fibonacci.Goal()
        goal_msg.order = order
        goal_msg.data = self.data_buffer
        self._action_client.wait_for_server()
        return self._action_client.send_goal_async(goal_msg)

def main(args=None):
    rclpy.init(args=args)
    action_client = FibonacciActionClient()
    for i in range(500):
        future = action_client.send_goal(10)
        rclpy.spin_until_future_complete(action_client, future)

if __name__ == '__main__':
    main()

Action server fibonacci_action_server.py:

#!/usr/bin/env python3

import time
import tracemalloc

import rclpy
from rclpy.action import ActionServer
from rclpy.node import Node

from fibonacci_package.action import Fibonacci

class FibonacciActionServer(Node):

    def __init__(self):
        super().__init__('fibonacci_action_server')
        self._action_server = ActionServer(
            self,
            Fibonacci,
            'fibonacci',
            self.execute_callback)

    def execute_callback(self, goal_handle):
        self.get_logger().info('Executing goal...')
        goal_handle.succeed()
        result = Fibonacci.Result()
        return result

def main(args=None):
    tracemalloc.start()

    rclpy.init(args=args)
    fibonacci_action_server = FibonacciActionServer()

    snapshot1 = tracemalloc.take_snapshot()
    t0 = time.time()
    while time.time()-t0 < 60.0:
        rclpy.spin_once(fibonacci_action_server)
    snapshot2 = tracemalloc.take_snapshot()
    top_stats = snapshot2.compare_to(snapshot1, 'lineno')

    print("[ Top 10 differences ]")
    for stat in top_stats[:10]:
        print(stat)

if __name__ == '__main__':
    main()

Start the client before the server, each in a separate terminal:

ros2 run fibonacci_package fibonacci_action_client.py
ros2 run fibonacci_package fibonacci_action_server.py

Expected behavior

Constant memory footprint of both server and client. (Below 100 MiB per process.)

Actual behavior

The client memory footprint is constant (below 78 MiB). However, the server memory footprint grows over time at the rate of ca. 1.3 GiB per minute.

The server prints:

[INFO] [1718631498.683267906] [fibonacci_action_server]: Executing goal...
[INFO] [1718631498.864721171] [fibonacci_action_server]: Executing goal...
(...)
[INFO] [1718631499.046434560] [fibonacci_action_server]: Executing goal...
[INFO] [1718631499.229864275] [fibonacci_action_server]: Executing goal...
[ Top 10 differences ]
/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py:453: size=1325 MiB (+1325 MiB), count=1817 (+1817), average=747 KiB
/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py:68: size=92.7 KiB (+92.7 KiB), count=1479 (+1479), average=64 B
/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py:300: size=47.7 KiB (+47.7 KiB), count=982 (+982), average=50 B
/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/task.py:32: size=46.4 KiB (+46.4 KiB), count=657 (+657), average=72 B
/opt/ros/humble/local/lib/python3.10/dist-packages/unique_identifier_msgs/msg/_uuid.py:77: size=41.2 KiB (+41.2 KiB), count=977 (+977), average=43 B
/usr/lib/python3.10/posixpath.py:373: size=40.3 KiB (+40.3 KiB), count=349 (+349), average=118 B
/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py:301: size=40.0 KiB (+40.0 KiB), count=655 (+655), average=63 B
/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py:312: size=39.2 KiB (+39.2 KiB), count=645 (+645), average=62 B
/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py:275: size=33.3 KiB (+33.3 KiB), count=581 (+581), average=59 B
/usr/lib/python3.10/linecache.py:137: size=33.2 KiB (+33.2 KiB), count=326 (+326), average=104 B

Additional information

The memory leak seems to happen in /opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py, note the allocated 1325 MiB reported by tracemalloc.

Line 453 of /opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py is:

taken_data = self._handle.take_goal_request(
    self._action_type.Impl.SendGoalService.Request,
)

which calls https://github.com/ros2/rclpy/blob/humble/rclpy/src/rclpy/action_server.cpp#L127.

Barry-Xu-2018 commented 1 week ago

Please note that the results will be retained on the action server for a while (the default is 900 seconds).

https://github.com/ros2/rclpy/blob/091f7eb64af8a5e52db4c8fbc6818499ec563a0a/rclpy/rclpy/action/server.py#L200

You can modify this parameter to determine if it's causing the issue.

fujitatomoya commented 1 week ago

i think this can be resolved with https://github.com/ros2/rclpy/pull/1171, default value is way too big...

holesond commented 1 week ago

Thank you @Barry-Xu-2018 and @fujitatomoya for the suggestions. Setting result_timeout=1 has fixed the memory leak for me. A ten second long result_timeout may still be too long for larger request or response payloads such as images.

fujitatomoya commented 1 week ago

@holesond you are welcome.

A ten second long result_timeout may still be too long for larger request or response payloads such as images.

https://github.com/ros2/rclpy/pull/1171 changes default timeout, you can configure the timeout as they like.