Memory consumption grow when using Nav2 Simple Command API

chanhhoang99 commented 1 day ago

Bug report

Required Info:

Operating System:
- Ubuntu 24.04.1 LTS
ROS2 Version:
- Rolling
Version or commit hash:
- 771eca41bb7b6ad26856c434dd79e48a399c4601
DDS implementation:
- Fast-RTPS

Steps to reproduce issue

Using def goToPose(self, pose, behavior_tree='') to move the robot to a goal pose.
Repeat calling goToPose(self, pose, behavior_tree='') function many times to move around and let it run for a long time.

Expected behavior

Memory consumption should not increase overtime.

Actual behavior

Memory consumption increases overtime.

Additional information

I have tried using tracemalloc to find out top memory consumer. Below is the result after 142 goals:

[ Top 10 memory consumers ]
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py:218: size=24.8 MiB, count=295565, average=88 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py:837: size=22.5 MiB, count=147780, average=160 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/task.py:153: size=15.7 MiB, count=294601, average=56 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/task.py:50: size=12.4 MiB, count=295570, average=44 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/task.py:231: size=12.4 MiB, count=295566, average=44 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py:220: size=10.2 MiB, count=147782, average=73 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/task.py:48: size=8082 KiB, count=147777, average=56 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py:324: size=5773 KiB, count=147780, average=40 B
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py:573: size=1155 KiB, count=2, average=577 KiB
/opt/ros/rolling/lib/python3.12/site-packages/rclpy/action/client.py:272: size=11.0 KiB, count=241, average=47 B

[ Detailed traceback for the top memory consumer ]
  File "/opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py", line 218
    task = Task(callback, args, kwargs, executor=self)

Overall, the memory consumption grow from 100MiB to 700MiB after 142 goals pose executed.

chanhhoang99 commented 1 day ago

Seem like this issue points to the rclpy library. self._tasks in the /opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py is getting bigger overtime.

SteveMacenski commented 1 day ago

Perhaps this should be filed with rclpy if its not from the application itself? What's your program look like?

chanhhoang99 commented 1 day ago

Hi @SteveMacenski , Below is my python script

#!/usr/bin/env python3

import rclpy
from rclpy.node import Node
from nav2_simple_commander.robot_navigator import BasicNavigator, TaskResult
from geometry_msgs.msg import PoseStamped
import time
import json

class GoToPoseNode(Node):
    def __init__(self):
        super().__init__('go_to_pose_node')

        # Initialize the BasicNavigator
        self.navigator = BasicNavigator()
        self.goalDataFile = '/home/hoc3hc/taste_ws/src/tasteRobot2/goal_database/goal_list.json'
        self.goalData = self.loadGoalData(self.goalDataFile)

        # Wait for Nav2 to start
        self.get_logger().info('Waiting for Nav2 to be fully active...')
        self.navigator.waitUntilNav2Active()

        # Define a pose goal (example: x = 2.0, y = 2.0, orientation = 0)
        counter_goal = 0
        while True:
            for item in self.goalData:
                goal = self.goalData[item]
                goal_pose = PoseStamped()
                goal_pose.header.frame_id = 'map'
                goal_pose.header.stamp = self.get_clock().now().to_msg()

                # Map goal data to PoseStamped fields
                goal_pose.pose.position.x = goal['x']
                goal_pose.pose.position.y = goal['y']
                goal_pose.pose.position.z = 0.0  # Assuming 2D navigation, z is set to 0.0

                goal_pose.pose.orientation.x = 0.0
                goal_pose.pose.orientation.y = 0.0
                goal_pose.pose.orientation.z = goal['z']
                goal_pose.pose.orientation.w = goal['w']

                # Send the goal to Nav2
                self.get_logger().info(f'Going to goal: ({goal_pose.pose.position.x}, {goal_pose.pose.position.y})')
                self.navigator.goToPose(goal_pose)

                # Monitor the navigation status
                self.monitor_navigation()
                counter_goal = counter_goal + 1
                self.get_logger().info(f'Execute goal : {counter_goal}')

    def monitor_navigation(self):
        while not self.navigator.isTaskComplete():
            feedback = self.navigator.getFeedback()

            if feedback:
                self.get_logger().info(f'Distance remaining: {feedback.distance_remaining:.2f} meters')

            time.sleep(1.0)  # Wait for a second before checking again

        # Check result of the navigation task
        result = self.navigator.getResult()

        if result == TaskResult.SUCCEEDED:
            self.get_logger().info('Reached the goal!')
        elif result == TaskResult.CANCELED:
            self.get_logger().info('The goal was canceled.')
        elif result == TaskResult.FAILED:
            self.get_logger().info('Failed to reach the goal.')

        # Shut down the node after reaching the goal
        # self.navigator.lifecycleShutdown()

    def loadGoalData(self, file_path):
        try:
            with open(file_path, 'r') as f:
                data = json.load(f)

            # Check if the data is an object or list and handle accordingly
            if isinstance(data, dict):
                # Handle case where the JSON is an object
                if 'goals' in data and isinstance(data['goals'], list):
                    goal_dict = {goal['goal_id']: goal for goal in data['goals']}
                else:
                    self.get_logger().warn(f"Invalid format in JSON file: {file_path}. Expected 'goals' key with a list value.")
                    goal_dict = {}
            elif isinstance(data, list):
                # Handle case where the JSON is a list
                goal_dict = {goal['goal_id']: goal for goal in data}
            else:
                self.get_logger().warn(f"Unexpected format in JSON file: {file_path}. Expected an object or list.")
                goal_dict = {}

            return goal_dict

        except FileNotFoundError:
            self.get_logger().warn(f"File not found: {file_path}")
            return {}
        except json.JSONDecodeError:
            self.get_logger().warn(f"Error decoding JSON file, file is empty or malformed: {file_path}")
            return {}

def main(args=None):
    rclpy.init(args=args)

    node = GoToPoseNode()

    rclpy.spin(node)

    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

I am able to see the growth of memory by each call to goToPose function by tool like btop.

SteveMacenski commented 1 day ago

I think this is worth filing with rclpy, especially if you can show a minimal test case where calling repeated actions has this effect using some basic dummy server like finanocci used in the tutorials. This seems like its either caused by something we do (which I don't see) or the executor / action python APIs are continuing to store feedback / action goals even though they're not relevant anymore. I would think if its something in the Simple Commander, we'd see some simple commander file in that memory list. So, at worst case, it seems like it could be that some way that we're using the rclpy API is asking rclpy to store old data, which I don't see looking at the API docs. So, I think its a rclpy action bug perhaps.

You could show that to be the case with a simple action being called more directly without using the simple commander. That would be a the reproduction case that whomever on the rclpy side would ask for anyway to make this same point, so its dual purpose to (1) validate its not something in the Simple Commander we can fix and (2) give the rclpy maintainers what they need to investigate.

chanhhoang99 commented 1 day ago

Thanks for your information @SteveMacenski I will try with a minimal example like you mentioned and then raise an issue for rclpy maintainers. I also find a workaround for this by clearing the self._tasks list in the executor.

def spin_until_future_complete(
    node: 'Node',
    future: Future,
    executor: Optional['Executor'] = None,
    timeout_sec: Optional[float] = None
) -> None:
    """
    Execute work until the future is complete.

    Callbacks and other work will be executed by the provided executor until ``future.done()``
    returns ``True`` or the context associated with the executor is shutdown.

    :param node: A node to add to the executor to check for work.
    :param future: The future object to wait on.
    :param executor: The executor to use, or the global executor if ``None``.
    :param timeout_sec: Seconds to wait. Block until the future is complete
        if ``None`` or negative. Don't wait if 0.
    """
    executor = get_global_executor() if executor is None else executor
    try:
        executor.add_node(node)
        executor.spin_until_future_complete(future, timeout_sec)
    finally:
        executor.remove_node(node)
        # Workaround
        if executor._tasks:
            executor._tasks.clear()

Yet, i am not fully sure and aware if there is any impact of doing so.

SteveMacenski commented 1 day ago

I'm sure there's something, but that's what a thraad with the maintainers would be good for. When you open that ticket, tag me so I can track it as well

chanhhoang99 commented 14 hours ago

Hi @SteveMacenski , I think I am able to replicate minimal example of Fibonacci action server and client. The behavior is same with Nav2 simple command API goToPose function. Below is the action server:

import time

import rclpy
from rclpy.action import ActionServer
from rclpy.executors import ExternalShutdownException
from rclpy.node import Node

from custom_action_interfaces.action import Fibonacci

class FibonacciActionServer(Node):

    def __init__(self):
        super().__init__('fibonacci_action_server')
        self._action_server = ActionServer(
            self,
            Fibonacci,
            'fibonacci',
            self.execute_callback)

    def execute_callback(self, goal_handle):
        self.get_logger().info('Executing goal...')

        feedback_msg = Fibonacci.Feedback()
        feedback_msg.partial_sequence = [0, 1]

        for i in range(1, goal_handle.request.order):
            feedback_msg.partial_sequence.append(
                feedback_msg.partial_sequence[i] + feedback_msg.partial_sequence[i-1])
            self.get_logger().info('Feedback: {0}'.format(feedback_msg.partial_sequence))
            goal_handle.publish_feedback(feedback_msg)
            time.sleep(0.1)

        goal_handle.succeed()

        result = Fibonacci.Result()
        result.sequence = feedback_msg.partial_sequence
        return result

def main(args=None):
    try:
        with rclpy.init(args=args):
            fibonacci_action_server = FibonacciActionServer()

            rclpy.spin(fibonacci_action_server)
    except (KeyboardInterrupt, ExternalShutdownException):
        pass

if __name__ == '__main__':
    main()

Below is the action client:

import rclpy
from rclpy.action import ActionClient
from rclpy.executors import ExternalShutdownException
from rclpy.node import Node

from custom_action_interfaces.action import Fibonacci
import time

class FibonacciActionClient(Node):

    def __init__(self):
        super().__init__('fibonacci_action_client')
        self._action_client = ActionClient(self, Fibonacci, 'fibonacci')

    def feedback(self, msg):
        print(msg)

    def send_goal(self, order):
        goal_msg = Fibonacci.Goal()
        goal_msg.order = order

        self._action_client.wait_for_server()

        return self._action_client.send_goal_async(goal_msg,self.feedback)

def main(args=None):
    try:
        with rclpy.init(args=args):
            action_client = FibonacciActionClient()
            while rclpy.ok:

                future = action_client.send_goal(1)
                # suspicious function that make the client node memory grow
                rclpy.spin_until_future_complete(action_client, future)
                time.sleep(0.05)
    except (KeyboardInterrupt, ExternalShutdownException):
        pass

if __name__ == '__main__':
    main()

Picture that I caputre from btop monitoring: Screenshot from 2024-10-09 21-14-33 Screenshot from 2024-10-09 16-10-25 Screenshot from 2024-10-09 14-29-02 By the time I am writting this comment, it is now 203MiB and continue to grow. What is your though on this ? Is it clear enough to report this to rclpy maintainers with this minimal example ?

SteveMacenski commented 13 hours ago

Yes, I believe so! Please file it and link to this ticket / tag me to keep the issue trail!

ros-navigation / navigation2