ros2 / rclpy

rclpy (ROS Client Library for Python)
Apache License 2.0
289 stars 224 forks source link

Executors mishandle invalid waitables #1284

Closed mhidalgo-bdai closed 4 months ago

mhidalgo-bdai commented 4 months ago

Bug report

Required Info:

Steps to reproduce issue

Take the following snippet:

from rclpy.task import Future

def wait_for_message_async(node, msg_type, topic_name, qos_profile): 
    future = Future()

    def callback(msg) -> None:
        if not future.done():
            future.set_result(msg)

    sub = node.create_subscription(msg_type, topic_name, callback, qos_profile)
    future.add_done_callback(lambda future: node.destroy_subscription(sub))
    return future

And call it repeatedly in a thread other than that of the executor associated to the node.

Expected behavior

No issues whatsoever.

Actual behavior

Eventually, an InvalidHandle exception is raised in the executor thread from within subscription event handlers.

Additional information

This issue is much harder to reproduce than it is to explain. When the executor collects node entities, it does not exclude waitables with invalid handles from the wait set like it does for every other entity. So if you are unlucky enough to have a waitable destroyed in between the time it is collected and the time it is added to the wait set, you will get an InvalidHandle exception. The fix is equally simple: prune the list of waitables, excluding those that raised an InvalidHandle when pushed to the context stack i.e. replace https://github.com/ros2/rclpy/blob/99f1ce933ddd1677fda8fa602af3684b93f814eb/rclpy/rclpy/executors.py#L652-L657

with

for waitable in list(waitables):
    try:
        context_stack.enter_context(waitable)
        entity_count += waitable.get_num_entities()
    except InvalidHandle:
        waitables.remove(waitable)
Barry-Xu-2018 commented 4 months ago

When the executor collects node entities, it does not exclude waitables with invalid handles from the wait set like it does for every other entity. So if you are unlucky enough to have a waitable destroyed in between the time it is collected and the time it is added to the wait set, you will get an InvalidHandle exception.

Yes. The invalid waitable should be removed here.

for waitable in list(waitables):
    try:
        context_stack.enter_context(waitable)
        entity_count += waitable.get_num_entities()
    except InvalidHandle:
        waitables.remove(waitable)

From what I understand, calling list.remove() while iterating through the list is unsafe. How about

valid_waitables = []
for waitable in list(waitables):
    try:
        context_stack.enter_context(waitable)
        entity_count += waitable.get_num_entities()
        valid_waitables.append(waitable)
    except InvalidHandle:
        pass
waitables = valid_waitables
MatthijsBurgh commented 4 months ago

First solution already creates a new list, so it is not removing items from the list is iterating on. So either go for:

for waitable in list(waitables):
    try:
        context_stack.enter_context(waitable)
        entity_count += waitable.get_num_entities()
    except InvalidHandle:
        waitables.remove(waitable)

or

valid_waitables = []
for waitable in waitables:
    try:
        context_stack.enter_context(waitable)
        entity_count += waitable.get_num_entities()
        valid_waitables.append(waitable)
    except InvalidHandle:
        pass
#  Use valid_waitables later on in the code

I have no clue, which one has the best performance.

Barry-Xu-2018 commented 4 months ago

First solution already creates a new list, so it is not removing items from the list is iterating on.

Yes. I was mistaken. list(waitables) created a new a list.

mhidalgo-bdai commented 4 months ago

I have no clue, which one has the best performance.

Probably the latter, though the difference is likely negligible for the usual number of waitables a single executor may deal with (<100 I'd think). It's three O(N) operations (recreate, iterate, remove) plus (maybe) garbage collection vs one O(N) operation (iterate).