ros2 / design

Design documentation for ROS 2.0 effort
http://design.ros2.org/
Apache License 2.0
225 stars 193 forks source link

Improvements to rmw for deterministic execution #259

Closed iluetkeb closed 4 years ago

iluetkeb commented 5 years ago

Background

In Casini et al, 2019, it is shown that execution of callbacks can happen in a different order than messages coming in. See this We also have an example for proof. This is usually not what is expected, and it happens only by accident, too.

Goal

We want to execute messages in importance order, which -- in the absence of other priorities -- is usually message arrival order.

Problem

To execute in message-arrival order, the executor needs ordering information. This is currently not possible in the rmw API, because the rmw_wait_set only has binary information: Either data is available on a topic or not. We don't know how old it is. Moreover, a topic may have more than one message waiting, where some or all may be older than a message for a different topic. These other messages will currently be skipped until the next invocation of rmw_wait.

Options

I see two general approaches to address this: 1) We ask the middleware for timestamps on all waiting messages and perform ordering on the executor level. 2) We ask the middleware "which object(s), of the ones in the wait_set, should we handle next?" where "next" is typically decided by "has the oldest data".

Q: Can anybody think of different options?

Discussion

Option 1) keeps the current rmw design, but adds more data. This appears more straightforward at first, but since there may be multiple messages waiting for each object, the data size is unpredictable. Also, it is not trivial to obtain this information from the middleware. The naive implementation has to deserialize all messages to get the SampleInfo. Alternatively, we could keep a listener attached at all times, and use it to determine arrival time. Or, we could modify the middleware to maintain this information for us without having to deserialize.

Option 2) either changes rmw_wait, or adds a new function with these new semantics. This will likely require more modifications in the rmw-implementations, but it would likely provide better options for the rmw-implementations to optimize obtaining this information. It would also limit the data-size, and could even make use of QoS information on the middleware layer.

iluetkeb commented 4 years ago

Alright, the work set out in this ticket is largely done: We've merged API changes for acquiring timestamps for both topics and services and we have at two implementations of those already (at varying degrees of completeness, but both sufficient for now).

This has been quite a bit more work than expected, but I'm happy we've got a good solution now. A big "thank you" to everybody who participated, particularly @wjwwood, @rotu, @eboasson and @dirk-thomas for sticking with it, and getting it into Foxy. @rotu and @eboasson, thanks for your objections and explanations -- I think it has made a big difference to the result. Hopefully for the better, but of course, as they say, all remaining errors are mine ;-)

To those participating in the earlier part of the discussion: Thanks for the reference information on your respective solutions to this. The work on the executor to actually make use of all this will keep us busy for a while.

I'm keeping this ticket open for the moment -- I would like to merge some documentation at least, summarizing all that we have discussed here. Since I had to push other work away to complete this, it will take a moment until I can return, but feel free to ping me if you think it takes too long ;-)

ros-discourse commented 4 years ago

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros2-middleware-change-proposal/15863/18

clalancette commented 4 years ago

@iluetkeb friendly ping; what's the status of this ticket?