ros2 / rclpy

rclpy (ROS Client Library for Python)
Apache License 2.0
285 stars 224 forks source link

ROS2 Iron: Action Servers Become Unresponsive After Modifying Collision Checking #1337

Open Infraviored opened 3 weeks ago

Infraviored commented 3 weeks ago

I'm working on a ROS2 Iron project and have encountered a perplexing issue with my action servers. I've implemented collision checking modification without using MoveIt Task Constructor (which isn't available for Iron). The core functionality works, but I'm facing a critical problem:

The Issue: After executing a collision checking modification, ALL action servers in my main node become unresponsive. Here's what I've observed:

  1. The modify_collision_checking function executes successfully and completes.
  2. I receive a success message: [INFO] Completed modify_collision_checking: test_box, ['r1_gripper'], True, telling me the funciton executed all the way.
  3. After this, no action servers in the main node respond to messages.

What I've Tried:

The Code: Here's a simplified version of my CollisionObjectHandler class:

class CollisionObjectHandler:
   def __init__(self, node: Node):
       self.node = node

   def modify_collision_checking(
       self, object_id: str, target_names: list, allow: bool
   ):
       try:
           get_planning_scene = self.node.get_planning_scene
           while not get_planning_scene.wait_for_service(timeout_sec=1.0):
               self.node.logger.info(
                   "get_planning_scene service not available, waiting..."
               )

           req = GetPlanningScene.Request()
           req.components.components = PlanningSceneComponents.ALLOWED_COLLISION_MATRIX

           future = get_planning_scene.call_async(req)
           rclpy.spin_until_future_complete(self.node, future)
           response = future.result()

           if response is None:
               self.node.logger.error("Failed to get current planning scene")
               return

           current_scene = response.scene

           if current_scene.allowed_collision_matrix.entry_names:
               acm = current_scene.allowed_collision_matrix
           else:
               acm = AllowedCollisionMatrix()

           self.node.logger.debug(f"Current ACM entries: {acm.entry_names}")

           for name in [object_id] + target_names:
               if name not in acm.entry_names:
                   acm.entry_names.append(name)
                   new_entry = AllowedCollisionEntry()
                   new_entry.enabled = [False] * len(acm.entry_names)
                   acm.entry_values.append(new_entry)

           for entry in acm.entry_values:
               while len(entry.enabled) < len(acm.entry_names):
                   entry.enabled.append(False)

           obj_index = acm.entry_names.index(object_id)
           for target_name in target_names:
               target_index = acm.entry_names.index(target_name)
               acm.entry_values[obj_index].enabled[target_index] = allow
               acm.entry_values[target_index].enabled[obj_index] = allow
               self.node.logger.debug(
                   f"Modified collision between {object_id} and {target_name}: {allow}"
               )

           planning_scene_msg = PlanningScene()
           planning_scene_msg.is_diff = True
           planning_scene_msg.allowed_collision_matrix = acm

           self.node.scene_pub.publish(planning_scene_msg)

           action = "Allowed" if allow else "Disallowed"
           self.node.logger.debug(
               f"{action} collisions between {object_id} and {target_names}"
           )

           self.node.logger.debug("Updated ACM:")
           for i, entry_name in enumerate(acm.entry_names):
               self.node.logger.debug(f"{entry_name}: {acm.entry_values[i].enabled}")

       except Exception as e:
           self.node.logger.error(f"Error modifying collision checking: {str(e)}")

       self.node.logger.info(
           f"Completed modify_collision_checking: {object_id}, {target_names}, {allow}"
       )

   def change_object_collision_callback(self, goal_handle):
       self.node.logger.info(
           f"Received new change_object_collision request: {goal_handle.request}"
       )
       object_id = goal_handle.request.object_id
       allow_collision = goal_handle.request.allow
       gripper_name = "r1_gripper"

       if allow_collision:
           self.modify_collision_checking(object_id, [gripper_name], True)
       else:
           self.modify_collision_checking(object_id, [gripper_name], False)

       success = True
       message = f"{'Allowed' if allow_collision else 'Disallowed'} collisions between {object_id} and gripper"
       goal_handle.succeed()
       return ChangeObjectCollision.Result(success=success, message=message)

And I'm setting up the action server like this in my main node (which imports CollisionObjectHandler:

from rclpy.action import ActionServer
from rclpy.node import Node
from script1 import CollisionObjectHandler

class PlanningAndSceneManagerServer(Node):
    """
    A ROS2 node for managing the scene, including mesh acquisition and object placement.
    """

    def __init__(self):
        super().__init__("scene_manager_server")
        self.collision_object_handler = CollisionObjectHandler(self)
        self.scene_pub = self.create_publisher(
            PlanningScene, "/planning_scene", qos_profile
        )
        self.get_planning_scene = self.create_client(
            GetPlanningScene, "/get_planning_scene"
        )

        self._change_collision_server = ActionServer(
            self,
            ChangeObjectCollision,
            "change_object_collision",
            self
        self._change_attachment_server = ActionServer(
            self,...

The Question: What could be causing all action servers to become unresponsive after modify_collision_checking executes? Is there something in this function that might be interfering with the ROS2 communication infrastructure?

Again, the function itself does not get stuck. It terminates successfully and the ActionServer sends a Success message. Only afterwards, all action servers (also these fully independent of CollistionObjectHandler) of the main script no longer receive.

Any insights or suggestions for debugging this issue would be greatly appreciated. I'm open to alternative approaches or workarounds if necessary.

fujitatomoya commented 2 weeks ago

Thanks for posting issue.

To be sure, I would want to see the complete application but guessing like below.

So i am expecting that modify_collision_checking is called as ActionServer.execute_callback, and modify_collision_checking calls service request get_planning_scene.call_async(req) and waiting on the result via rclpy.spin_until_future_complete in the execute_callback in the executer.

This is not recommended design, that calling the blocking call in the callback of executor.

What happens here is after rclpy.spin_until_future_complete completed, self.node is removed from the executor. That said, modify_collision_checking function can be completed as you mentioned, but no more executable entity cannot be taken or executed in this executor anymore since the node has been removed from the executer.

This behavior is originally bug, reported https://github.com/ros2/rclpy/issues/1313 and fixed with https://github.com/ros2/rclpy/pull/1316. downstream distro backports are still under review, so your distribution iron does not have this fix yet.

besides above issue and bug fix, i would like you to try async & await design for modify_collision_checking. the followings are pseudo code, but i believe it should fix your blocking problem in the application.

   async def modify_collision_checking(
       self, object_id: str, target_names: list, allow: bool
   ):
       try:
           get_planning_scene = self.node.get_planning_scene
           while not get_planning_scene.wait_for_service(timeout_sec=1.0):
               self.node.logger.info(
                   "get_planning_scene service not available, waiting..."
               )

           req = GetPlanningScene.Request()
           req.components.components = PlanningSceneComponents.ALLOWED_COLLISION_MATRIX

           future = get_planning_scene.call_async(req)
           # REMOVE THIS: rclpy.spin_until_future_complete(self.node, future)
           result = await future

           response = result