ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
514 stars 410 forks source link

Data race fixes #2494

Closed jmachowinski closed 2 months ago

jmachowinski commented 2 months ago

@mjcarroll @wjwwood

Make the executors more stable with our stack.

mjcarroll commented 2 months ago
mjcarroll commented 2 months ago

All test failures here are related to https://github.com/ros2/rosidl_typesupport_fastrtps/pull/117

wjwwood commented 2 months ago

Can you explain the changes?

jmachowinski commented 2 months ago

@wjwwood did you look at the commit messages of the single commits ?

jmachowinski commented 2 months ago

https://github.com/ros2/rclcpp/pull/2494/commits/7d0b532d44262b67b0b523d4b7b059c392105b25

Any rebuild request coming from a second thread may get lost, if issued, while the execution thread is in Executor::collect_entities(). Changes the clearing of the request to an atomic operation to fix this.

https://github.com/ros2/rclcpp/pull/2494/commits/a968a3c1e511775f8274bcd786b7cdc702e37d2c

The waitable did not rearm its guard condition if not processed yet. I think In the long run we should have a better API for this, as it is a common pitfall.

https://github.com/ros2/rclcpp/pull/2494/commits/5d99e3b594dd9330540327aa74c7b28212f35466

After adding all of the timer_index++ in the outer code the code was basically same as the other next_XYZ, therefore I just replaced it, to a a more uniform code.

https://github.com/ros2/rclcpp/pull/2494/commits/39e0d94076224f6e34ac555c32b0a063a5f543d1

This fixes a possible data race, as the on_trigger_callback might rely on the facts, that the executor wakes up, after it was executed.

mjcarroll commented 2 months ago
jmachowinski commented 2 months ago

The unstable tests on windows seems to be non related, see https://github.com/ros2/rmw_connextdds/issues/136

jmachowinski commented 2 months ago

closed in favor for #2500 2500

wjwwood commented 2 months ago

On my machine, the test TestAddCallbackGroupsToExecutor/StaticSingleThreadedExecutor.add_callback_groups_after_add_node_to_executor (for all rmw implementations) is timing out.

The following tests FAILED:
     28 - test_add_callback_groups_to_executor__rmw_cyclonedds_cpp (Timeout)
     29 - test_add_callback_groups_to_executor__rmw_fastrtps_cpp (Timeout)
     30 - test_add_callback_groups_to_executor__rmw_fastrtps_dynamic_cpp (Timeout)

I'll run CI just to confirm:

I wasn't able to see any obvious issue, the CI is with my changes in https://github.com/ros2/rclcpp/pull/2500, but locally even this pr's changes have that problem.