ros2 / geometry2

A set of ROS packages for keeping track of coordinate transforms.
BSD 3-Clause "New" or "Revised" License
116 stars 193 forks source link

rclpy BufferClient lookup_transform not working, timeout #438

Open charlielito opened 3 years ago

charlielito commented 3 years ago

Bug report

Required Info:

Steps to reproduce issue

Calling BufferClient.lookup_transform always timeouts although if I call the buffer server via ros2 cli it works.

Create a node like the following:

#!/usr/bin/env python3
import rclpy
from rclpy.node import Node
from tf2_ros.buffer_client import BufferClient
from rclpy.duration import Duration
from std_msgs.msg import Empty

class TestNode(Node):
    def __init__(
        self,
    ):
        super().__init__("test_node")
        self.cb_group = None
        self.buffer = BufferClient(
            self,
            "/tf2_buffer_server",
            timeout_padding=1.0,
            callback_group=self.cb_group,
        )
        self.sub = self.create_subscription(
            msg_type=Empty,
            topic="/test_buffer_client",
            callback=self.buffer_cb,
            qos_profile=1,
        )

    def buffer_cb(self, msg):
        _ = self.buffer.lookup_transform(
            "map",
            "odom",
            rclpy.time.Time(seconds=0, nanoseconds=0),
            timeout=Duration(seconds=1.0),
        )
        self.get_logger().info("Success")

def main(args=None):
    rclpy.init(args=args)
    node = TestNode()
    executor = None
    rclpy.spin(node, executor)

if __name__ == "__main__":
    main()

Make sure the buffer server is already running and there is a valid transform between map->odom. Then publish to the topic that calls BufferClient.lookup_transform.

ros2 topic pub -1 /test_buffer_client std_msgs/msg/Empty

The traceback is:

Traceback (most recent call last):
  File "/workspace/rover/ros2/install/navigation/lib/navigation/navigation_test", line 33, in <module>
    sys.exit(load_entry_point('navigation', 'console_scripts', 'navigation_test')())
  File "/workspace/rover/ros2/build/navigation/navigation/node_test.py", line 53, in main
    rclpy.spin(node, executor)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/__init__.py", line 191, in spin
    executor.spin_once()
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 711, in spin_once
    raise handler.exception()
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/task.py", line 239, in __call__
    self._handler.send(None)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 426, in handler
    await call_coroutine(entity, arg)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 351, in _execute_subscription
    await await_or_execute(sub.callback, msg)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 118, in await_or_execute
    return callback(*args)
  File "/workspace/rover/ros2/build/navigation/navigation/node_test.py", line 33, in buffer_cb
    _ = self.buffer.lookup_transform(
  File "/opt/ros/foxy/lib/python3.8/site-packages/tf2_ros/buffer_client.py", line 87, in lookup_transform
    return self.__process_goal(goal)
  File "/opt/ros/foxy/lib/python3.8/site-packages/tf2_ros/buffer_client.py", line 201, in __process_goal
    raise tf2.TimeoutException("The LookupTransform goal sent to the BufferServer did not come back in the specified time. Something is likely wrong with the server")
tf2.TimeoutException: The LookupTransform goal sent to the BufferServer did not come back in the specified time. Something is likely wrong with the server

Expected behavior

It shouldn't crash since it works with the ros2 cli. If you use the following it works:

ros2 action send_goal /tf2_buffer_server tf2_msgs/action/LookupTransform '{target_frame: map, source_frame: odom}'

Actual behavior

It timeouts every time.

Additional notes

I modified https://github.com/ros2/geometry2/blob/ros2/tf2_ros_py/tf2_ros/buffer_client.py#L61 to accept also a callback group that is passed to the ActionClient. Then, in the TestNode example above I set the self.cb_group=ReentrantCallbackGroup() and the executor to be a MultiThreadedExecutor. Only that way it works.

I am missing something?

charlielito commented 3 years ago

I am also having the same issue in rclpp, when using this inside a callback the BufferClient.lookupTransform timeouts. @clalancette I see there are some tests for this but not for testing it inside a callback function.

karl-schulz commented 2 years ago

I have the same problem - I think this feature is actually not ready to be used?

clalancette commented 2 years ago

I am also having the same issue in rclpp, when using this inside a callback the BufferClient.lookupTransform timeouts. @clalancette I see there are some tests for this but not for testing it inside a callback function.

For better or worse, that is by design. The problem with doing this is that you are blocking the executor, but the executor needs to run in order to get the transform.

The solution here is to make sure to never run blocking functions like lookupTransform from within a callback, or you use the MultiThreadedExecutor, or you instantiate the TransformListener with spin_thread = True.

charlielito commented 2 years ago

I am also having the same issue in rclpp, when using this inside a callback the BufferClient.lookupTransform timeouts. @clalancette I see there are some tests for this but not for testing it inside a callback function.

For better or worse, that is by design. The problem with doing this is that you are blocking the executor, but the executor needs to run in order to get the transform.

The solution here is to make sure to never run blocking functions like lookupTransform from within a callback, or you use the MultiThreadedExecutor, or you instantiate the TransformListener with spin_thread = True.

BufferClient doesn't have any TransformListener. Also using MultiThreadedExecutor does not work, you have to pass a ReentrantCallbackGroup to the client inside the BufferClient class -which currently is not exposed- as already mentioned in the issue.

Flova commented 2 years ago

This could be fixed by spinning a separate callback group in a separate thread. Independent execution of callback groups is currently only supported in rclcpp, but this pr aimed at implementing them https://github.com/ros2/rclpy/pull/909. TF2 also features a not so clean fix. You can pass the TransformListener a newly created node object. This new node is used to gather the transforms in parallel if you also set spin_thread to True.

charlielito commented 2 years ago

I think there is a misunderstanding here. The issue described in this thread refers to the BufferClient class which has nothing to do with a TransformListener. The BufferClient is just wrapping a rclpy.action.client.ActionClient to call a tf2_ros buffer_server using that client.

Flova commented 2 years ago

Oh nvm. I missread this.

tigelbri commented 1 year ago

I do have the same problem right now. Do I really need to change the BufferClient Interface or is there an other way of handling this issue?

bobbleballs commented 1 year ago

Hi all, thanks very much for your input. I ran into this issue and I've found a solution that doesn't require changing the BufferClient interface, though I do think this should be changed. @clalancette is there any progress on this? A quick and dirty way is to change the Node's default callback group to a ReentrantCallbackGroup() then using a multithreaded excecutor to run the node. This way when you create the BufferClient it will use a ReentrantCallbackGroup by default. Some example code below:

class ExampleActionServer(Node):
      def __init__(self):
          super().__init__('example_action_server')
          # Set the nodes default callback group to ReentrantCallbackGroup()
          self._default_callback_group = ReentrantCallbackGroup()
          self.tf_buffer = BufferClient(self, "/tf_action")
          self.tf_buffer.action_client.wait_for_server(5.0)

          self._action_server = ActionServer(
              self,
              ExampleAction,
              'example_action',
              self.execute_callback)

def main():
    rclpy.init()
    excecutor = rclpy.executors.MultiThreadedExecutor()

    example_action_server = ExampleActionServer()

    excecutor.add_node(example_action_server)
    excecutor.spin()

I've found that the buffer client does work from a callback this way.

tonynajjar commented 1 year ago

This blog is quite relevant for this issue to better understand it; https://karelics.fi/deadlocks-in-rclpy/

I would propose to the maintainers to put the service client of the BufferClient in a separate callback group, like is done for the subscribers of the transform listener

reinzor commented 5 months ago

For people running into this problem:

Setting spin_thread to true did not fix my problem of doing transform lookups within a callback. Adding a rclpy.executors.MultiThreadedExecutor did.

Flova commented 4 months ago

Also, for people that have timeout/blocking issues in ROS2 in general, the multithreaded executor helps. But keep in mind that the default callback group is mutually exclusive and you need to select a reentrant callback group (or multiple mutually exclusive ones) for the involved callbacks to use multiple threads in the multithreaded executor.