ros2 / rclpy

rclpy (ROS Client Library for Python)
Apache License 2.0
303 stars 225 forks source link

[Memory leak]: rclpy.spin_until_future_complete(action_client, future) causing program to grow in memory #1371

Closed chanhhoang99 closed 1 day ago

chanhhoang99 commented 1 week ago

Bug report

Required Info:

Steps to reproduce issue

Using below Fibonacci action server and client. Action server:

import time

import rclpy
from rclpy.action import ActionServer
from rclpy.executors import ExternalShutdownException
from rclpy.node import Node

from custom_action_interfaces.action import Fibonacci

class FibonacciActionServer(Node):

    def __init__(self):
        super().__init__('fibonacci_action_server')
        self._action_server = ActionServer(
            self,
            Fibonacci,
            'fibonacci',
            self.execute_callback)

    def execute_callback(self, goal_handle):
        self.get_logger().info('Executing goal...')

        feedback_msg = Fibonacci.Feedback()
        feedback_msg.partial_sequence = [0, 1]

        for i in range(1, goal_handle.request.order):
            feedback_msg.partial_sequence.append(
                feedback_msg.partial_sequence[i] + feedback_msg.partial_sequence[i-1])
            self.get_logger().info('Feedback: {0}'.format(feedback_msg.partial_sequence))
            goal_handle.publish_feedback(feedback_msg)
            time.sleep(0.1)

        goal_handle.succeed()

        result = Fibonacci.Result()
        result.sequence = feedback_msg.partial_sequence
        return result

def main(args=None):
    try:
        with rclpy.init(args=args):
            fibonacci_action_server = FibonacciActionServer()

            rclpy.spin(fibonacci_action_server)
    except (KeyboardInterrupt, ExternalShutdownException):
        pass

if __name__ == '__main__':
    main()

Action client:

import rclpy
from rclpy.action import ActionClient
from rclpy.executors import ExternalShutdownException
from rclpy.node import Node

from custom_action_interfaces.action import Fibonacci
import time

class FibonacciActionClient(Node):

    def __init__(self):
        super().__init__('fibonacci_action_client')
        self._action_client = ActionClient(self, Fibonacci, 'fibonacci')

    def feedback(self, msg):
        print(msg)

    def send_goal(self, order):
        goal_msg = Fibonacci.Goal()
        goal_msg.order = order

        self._action_client.wait_for_server()

        return self._action_client.send_goal_async(goal_msg,self.feedback)

def main(args=None):
    try:
        with rclpy.init(args=args):
            action_client = FibonacciActionClient()
            while rclpy.ok:

                future = action_client.send_goal(1)
                # suspicious function that make the client node memory grow
                rclpy.spin_until_future_complete(action_client, future)
                time.sleep(0.05)
    except (KeyboardInterrupt, ExternalShutdownException):
        pass

if __name__ == '__main__':
    main()

Expected behavior

Memory consumption from the client should not increase overtime

Actual behavior

Memory consumption from the client increases overtime

Additional information

Below is btop screen shot from the client. Screenshot from 2024-10-09 21-14-33 Screenshot from 2024-10-09 16-10-25 Screenshot from 2024-10-09 14-29-02

Regards the function rclpy.spin_until_future_complete(action_client, future), below is the tracemalloc output:

[ Top 10 memory consumers ]
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py:218: size=24.8 MiB, count=295565, average=88 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py:837: size=22.5 MiB, count=147780, average=160 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/task.py:153: size=15.7 MiB, count=294601, average=56 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/task.py:50: size=12.4 MiB, count=295570, average=44 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/task.py:231: size=12.4 MiB, count=295566, average=44 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py:220: size=10.2 MiB, count=147782, average=73 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/task.py:48: size=8082 KiB, count=147777, average=56 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py:324: size=5773 KiB, count=147780, average=40 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py:573: size=1155 KiB, count=2, average=577 KiB
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/action/client.py:272: size=11.0 KiB, count=241, average=47 B

[ Detailed traceback for the top memory consumer ]
  File "/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py", line 218
    task = Task(callback, args, kwargs, executor=self)

I think the executor._tasks increases overtime. Is it expected to be growing in the example action client and server above ? FYI SteveMacenski

Barry-Xu-2018 commented 1 week ago

@chanhhoang99

I try to reproduce this issue.
One question, I have no custom_action_interfaces.action.
Are the definitions for custom_action_interfaces.action and example_interfaces.action the same?

chanhhoang99 commented 1 week ago

Hi @Barry-Xu-2018 , Could you check this https://docs.ros.org/en/foxy/Tutorials/Intermediate/Creating-an-Action.html and follow the steps. It's the same with this tutorial

fujitatomoya commented 1 week ago

@Barry-Xu-2018 i am not sure if this is the bug in rclpy yet, but i will go ahead to assign this to take a look.

Barry-Xu-2018 commented 1 week ago

Could you check this https://docs.ros.org/en/foxy/Tutorials/Intermediate/Creating-an-Action.html and follow the steps. It's the same with this tutorial

Thanks. Action message is almost the same.

My test code (use example_interfaces.action) is located at https://github.com/Barry-Xu-2018/bug_reproduce/commit/b52e6baef862e3871164faa7d6b56bfca261ca9f.

I ran it for over 5 hours (My environment is based on the latest code of rolling), and the memory usage changed as follows: Screenshot from 2024-10-15 09-58-24 After more than 5 hours Screenshot from 2024-10-15 15-20-54

For client, resident memory is changed from 92M to 194M.
For server, Resident memory hasn't changed much (86M to 87M).

As reported, the memory usage on the client side is increasing significantly. So there's definitely an issue here with the client. I'll investigate further. @fujitatomoya

Barry-Xu-2018 commented 1 week ago

After investigating, I found the issue is related to the repeated calls to spin_until_future_complete.

When spin_until_future_complete returns, there may be other events that remain unprocessed.
The next time spin_until_future_complete is executed, there will be more unprocessed events again. This way, the pending events keep accumulating.

I find self._tasks keeps getting bigger.
https://github.com/ros2/rclpy/blob/a09a0312cf9e2bbe0f6dc4467c72060a2fb74ce5/rclpy/rclpy/executors.py#L184C9-L184C20

A not-so-good way to verify (It can't ensure that all events are processed.).
For client code, you can do

...
                rclpy.spin_until_future_complete(action_client, future)
                rclpy.spin_once(action_client, timeout_sec=0.01)
                rclpy.spin_once(action_client, timeout_sec=0.01)
                rclpy.spin_once(action_client, timeout_sec=0.01)
                rclpy.spin_once(action_client, timeout_sec=0.01)
                time.sleep(0.05)
...

In my explanation above, the word "event" should be more accurately referred to as "task".

chanhhoang99 commented 1 week ago

Do you mean that the API is working as expect but the memory growth is due to user calling the API repeatedly ?

Barry-Xu-2018 commented 1 week ago

Do you mean that the API is working as expect but the memory growth is due to user calling the API repeatedly ?

Yes. This function only waits for the specified event to happen (it returns once it occurs). Other waiting events are not processed. The next time spin_until_future_complete is called, the unprocessed events get added to self._tasks, causing self._tasks to grow larger. Therefore, the memory usage we observe increases. But this isn't the memory leak. Following the demo example is a better way to handle it: https://github.com/ros2/demos/blob/ea3661152a87bc48e3f277ca8131c85a1a23d661/action_tutorials/action_tutorials_py/action_tutorials_py/fibonacci_action_client.py

chanhhoang99 commented 1 week ago

Thanks alot. Let me try your suggestion.

fujitatomoya commented 1 week ago

@Barry-Xu-2018 thanks for checking this.

sorry i did not allocate the time to check the details but IMO this needs to be fixed in rclpy, because i think that the following application code does nothing wrong? after all, my take here is API is not implemented as designed, unless user application needs to live with memory leak???

            while rclpy.ok:
                future = action_client.send_goal(xxx)
                rclpy.spin_until_future_complete(action_client, future)
                // do something with result...
Barry-Xu-2018 commented 1 week ago

@fujitatomoya

Agree with your thought.
After further checking, I found a bug which causes the number of unfinished tasks is increasing.
I will create a PR to fix it and explain the cause in that PR.

Barry-Xu-2018 commented 1 week ago

@chanhhoang99 I discovered a bug that increases memory usage. You can test this fix (#1374) with your code. In my environment, the client memory still increases a bit, but it's much less than before.