ros2 / rclpy

rclpy (ROS Client Library for Python)
Apache License 2.0
310 stars 227 forks source link

Thread-Safety issue in rclpy executor leading to InvalidHandle exception #1355

Open Hytac opened 2 months ago

Hytac commented 2 months ago

Bug report

Threading issue with rclpy when accessing node lists in a non-thread-safe manner. I've prepared a reproducible example in here

Expected Error

At some point, NodeClient.py will raise an exception similar to the following:

Traceback (most recent call last):
  File "/home/myLocalSuperUser/ros/Issue/NodeClient.py", line 75, in <module>
    loop.run_until_complete(main())
  File "/home/myLocalSuperUser/ros/Issue/NodeClient.py", line 67, in main
    executor.spin()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 294, in spin
    self.spin_once()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 795, in spin_once
    self._spin_once_impl(timeout_sec)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 792, in _spin_once_impl
    future.result()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/task.py", line 94, in result
    raise self.exception()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/task.py", line 239, in __call__
    self._handler.send(None)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 430, in handler
    arg = take_from_wait_list(entity)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 365, in _take_client
    with client.handle:
rclpy._rclpy_pybind11.InvalidHandle: cannot use Destroyable because destruction was requested

Root Cause

The issue arises because rclpy's Executor accesses node lists in a non-thread-safe manner. The problem can be seen in the following code snippets from rclpy/Executor.py:

Example 1:

for node in nodes_to_use:
  subscriptions.extend(filter(self.can_execute, node.subscriptions))
  timers.extend(filter(self.can_execute, node.timers))
  clients.extend(filter(self.can_execute, node.clients))
  services.extend(filter(self.can_execute, node.services))
  node_guards = filter(self.can_execute, node.guards)
  waitables.extend(filter(self.can_execute, node.waitables))

Example 2:

for client in node.clients:
  if client.handle.pointer in clients_ready:
    if client.callback_group.can_execute(client):
      handler = self._make_handler(client, node, self._take_client, self._execute_client)
    yielded_work = True
    yield handler, client, node
leander2189 commented 2 months ago

Probably related: https://github.com/ros2/rclpy/issues/1206

sillkjc commented 2 months ago

+1 we are also experiencing this :(