Open chanhhoang99 opened 1 day ago
Seem like this issue points to the rclpy library. self._tasks in the /opt/ros/rolling/lib/python3.12/site-packages/rclpy/executors.py is getting bigger overtime.
Perhaps this should be filed with rclpy if its not from the application itself? What's your program look like?
Hi @SteveMacenski , Below is my python script
#!/usr/bin/env python3
import rclpy
from rclpy.node import Node
from nav2_simple_commander.robot_navigator import BasicNavigator, TaskResult
from geometry_msgs.msg import PoseStamped
import time
import json
class GoToPoseNode(Node):
def __init__(self):
super().__init__('go_to_pose_node')
# Initialize the BasicNavigator
self.navigator = BasicNavigator()
self.goalDataFile = '/home/hoc3hc/taste_ws/src/tasteRobot2/goal_database/goal_list.json'
self.goalData = self.loadGoalData(self.goalDataFile)
# Wait for Nav2 to start
self.get_logger().info('Waiting for Nav2 to be fully active...')
self.navigator.waitUntilNav2Active()
# Define a pose goal (example: x = 2.0, y = 2.0, orientation = 0)
counter_goal = 0
while True:
for item in self.goalData:
goal = self.goalData[item]
goal_pose = PoseStamped()
goal_pose.header.frame_id = 'map'
goal_pose.header.stamp = self.get_clock().now().to_msg()
# Map goal data to PoseStamped fields
goal_pose.pose.position.x = goal['x']
goal_pose.pose.position.y = goal['y']
goal_pose.pose.position.z = 0.0 # Assuming 2D navigation, z is set to 0.0
goal_pose.pose.orientation.x = 0.0
goal_pose.pose.orientation.y = 0.0
goal_pose.pose.orientation.z = goal['z']
goal_pose.pose.orientation.w = goal['w']
# Send the goal to Nav2
self.get_logger().info(f'Going to goal: ({goal_pose.pose.position.x}, {goal_pose.pose.position.y})')
self.navigator.goToPose(goal_pose)
# Monitor the navigation status
self.monitor_navigation()
counter_goal = counter_goal + 1
self.get_logger().info(f'Execute goal : {counter_goal}')
def monitor_navigation(self):
while not self.navigator.isTaskComplete():
feedback = self.navigator.getFeedback()
if feedback:
self.get_logger().info(f'Distance remaining: {feedback.distance_remaining:.2f} meters')
time.sleep(1.0) # Wait for a second before checking again
# Check result of the navigation task
result = self.navigator.getResult()
if result == TaskResult.SUCCEEDED:
self.get_logger().info('Reached the goal!')
elif result == TaskResult.CANCELED:
self.get_logger().info('The goal was canceled.')
elif result == TaskResult.FAILED:
self.get_logger().info('Failed to reach the goal.')
# Shut down the node after reaching the goal
# self.navigator.lifecycleShutdown()
def loadGoalData(self, file_path):
try:
with open(file_path, 'r') as f:
data = json.load(f)
# Check if the data is an object or list and handle accordingly
if isinstance(data, dict):
# Handle case where the JSON is an object
if 'goals' in data and isinstance(data['goals'], list):
goal_dict = {goal['goal_id']: goal for goal in data['goals']}
else:
self.get_logger().warn(f"Invalid format in JSON file: {file_path}. Expected 'goals' key with a list value.")
goal_dict = {}
elif isinstance(data, list):
# Handle case where the JSON is a list
goal_dict = {goal['goal_id']: goal for goal in data}
else:
self.get_logger().warn(f"Unexpected format in JSON file: {file_path}. Expected an object or list.")
goal_dict = {}
return goal_dict
except FileNotFoundError:
self.get_logger().warn(f"File not found: {file_path}")
return {}
except json.JSONDecodeError:
self.get_logger().warn(f"Error decoding JSON file, file is empty or malformed: {file_path}")
return {}
def main(args=None):
rclpy.init(args=args)
node = GoToPoseNode()
rclpy.spin(node)
node.destroy_node()
rclpy.shutdown()
if __name__ == '__main__':
main()
I am able to see the growth of memory by each call to goToPose function by tool like btop.
I think this is worth filing with rclpy, especially if you can show a minimal test case where calling repeated actions has this effect using some basic dummy server like finanocci used in the tutorials. This seems like its either caused by something we do (which I don't see) or the executor / action python APIs are continuing to store feedback / action goals even though they're not relevant anymore. I would think if its something in the Simple Commander, we'd see some simple commander file in that memory list. So, at worst case, it seems like it could be that some way that we're using the rclpy API is asking rclpy to store old data, which I don't see looking at the API docs. So, I think its a rclpy action bug perhaps.
You could show that to be the case with a simple action being called more directly without using the simple commander. That would be a the reproduction case that whomever on the rclpy side would ask for anyway to make this same point, so its dual purpose to (1) validate its not something in the Simple Commander we can fix and (2) give the rclpy maintainers what they need to investigate.
Thanks for your information @SteveMacenski I will try with a minimal example like you mentioned and then raise an issue for rclpy maintainers. I also find a workaround for this by clearing the self._tasks list in the executor.
def spin_until_future_complete(
node: 'Node',
future: Future,
executor: Optional['Executor'] = None,
timeout_sec: Optional[float] = None
) -> None:
"""
Execute work until the future is complete.
Callbacks and other work will be executed by the provided executor until ``future.done()``
returns ``True`` or the context associated with the executor is shutdown.
:param node: A node to add to the executor to check for work.
:param future: The future object to wait on.
:param executor: The executor to use, or the global executor if ``None``.
:param timeout_sec: Seconds to wait. Block until the future is complete
if ``None`` or negative. Don't wait if 0.
"""
executor = get_global_executor() if executor is None else executor
try:
executor.add_node(node)
executor.spin_until_future_complete(future, timeout_sec)
finally:
executor.remove_node(node)
# Workaround
if executor._tasks:
executor._tasks.clear()
Yet, i am not fully sure and aware if there is any impact of doing so.
I'm sure there's something, but that's what a thraad with the maintainers would be good for. When you open that ticket, tag me so I can track it as well
Hi @SteveMacenski , I think I am able to replicate minimal example of Fibonacci action server and client. The behavior is same with Nav2 simple command API goToPose function. Below is the action server:
import time
import rclpy
from rclpy.action import ActionServer
from rclpy.executors import ExternalShutdownException
from rclpy.node import Node
from custom_action_interfaces.action import Fibonacci
class FibonacciActionServer(Node):
def __init__(self):
super().__init__('fibonacci_action_server')
self._action_server = ActionServer(
self,
Fibonacci,
'fibonacci',
self.execute_callback)
def execute_callback(self, goal_handle):
self.get_logger().info('Executing goal...')
feedback_msg = Fibonacci.Feedback()
feedback_msg.partial_sequence = [0, 1]
for i in range(1, goal_handle.request.order):
feedback_msg.partial_sequence.append(
feedback_msg.partial_sequence[i] + feedback_msg.partial_sequence[i-1])
self.get_logger().info('Feedback: {0}'.format(feedback_msg.partial_sequence))
goal_handle.publish_feedback(feedback_msg)
time.sleep(0.1)
goal_handle.succeed()
result = Fibonacci.Result()
result.sequence = feedback_msg.partial_sequence
return result
def main(args=None):
try:
with rclpy.init(args=args):
fibonacci_action_server = FibonacciActionServer()
rclpy.spin(fibonacci_action_server)
except (KeyboardInterrupt, ExternalShutdownException):
pass
if __name__ == '__main__':
main()
Below is the action client:
import rclpy
from rclpy.action import ActionClient
from rclpy.executors import ExternalShutdownException
from rclpy.node import Node
from custom_action_interfaces.action import Fibonacci
import time
class FibonacciActionClient(Node):
def __init__(self):
super().__init__('fibonacci_action_client')
self._action_client = ActionClient(self, Fibonacci, 'fibonacci')
def feedback(self, msg):
print(msg)
def send_goal(self, order):
goal_msg = Fibonacci.Goal()
goal_msg.order = order
self._action_client.wait_for_server()
return self._action_client.send_goal_async(goal_msg,self.feedback)
def main(args=None):
try:
with rclpy.init(args=args):
action_client = FibonacciActionClient()
while rclpy.ok:
future = action_client.send_goal(1)
# suspicious function that make the client node memory grow
rclpy.spin_until_future_complete(action_client, future)
time.sleep(0.05)
except (KeyboardInterrupt, ExternalShutdownException):
pass
if __name__ == '__main__':
main()
Picture that I caputre from btop monitoring: By the time I am writting this comment, it is now 203MiB and continue to grow. What is your though on this ? Is it clear enough to report this to rclpy maintainers with this minimal example ?
Yes, I believe so! Please file it and link to this ticket / tag me to keep the issue trail!
Bug report
Required Info:
Steps to reproduce issue
Expected behavior
Memory consumption should not increase overtime.
Actual behavior
Memory consumption increases overtime.
Additional information
I have tried using tracemalloc to find out top memory consumer. Below is the result after 142 goals:
Overall, the memory consumption grow from 100MiB to 700MiB after 142 goals pose executed.