Improvements to rmw for deterministic execution

ros2 / design

Design documentation for ROS 2.0 effort

http://design.ros2.org/

Apache License 2.0

224 stars 193 forks source link

Improvements to rmw for deterministic execution #259

Closed iluetkeb closed 4 years ago

iluetkeb commented 5 years ago

Background

In Casini et al, 2019, it is shown that execution of callbacks can happen in a different order than messages coming in. See this We also have an example for proof. This is usually not what is expected, and it happens only by accident, too.

Goal

We want to execute messages in importance order, which -- in the absence of other priorities -- is usually message arrival order.

Problem

To execute in message-arrival order, the executor needs ordering information. This is currently not possible in the rmw API, because the rmw_wait_set only has binary information: Either data is available on a topic or not. We don't know how old it is. Moreover, a topic may have more than one message waiting, where some or all may be older than a message for a different topic. These other messages will currently be skipped until the next invocation of rmw_wait.

Options

I see two general approaches to address this: 1) We ask the middleware for timestamps on all waiting messages and perform ordering on the executor level. 2) We ask the middleware "which object(s), of the ones in the wait_set, should we handle next?" where "next" is typically decided by "has the oldest data".

Q: Can anybody think of different options?

Discussion

Option 1) keeps the current rmw design, but adds more data. This appears more straightforward at first, but since there may be multiple messages waiting for each object, the data size is unpredictable. Also, it is not trivial to obtain this information from the middleware. The naive implementation has to deserialize all messages to get the SampleInfo. Alternatively, we could keep a listener attached at all times, and use it to determine arrival time. Or, we could modify the middleware to maintain this information for us without having to deserialize.

Option 2) either changes rmw_wait, or adds a new function with these new semantics. This will likely require more modifications in the rmw-implementations, but it would likely provide better options for the rmw-implementations to optimize obtaining this information. It would also limit the data-size, and could even make use of QoS information on the middleware layer.

iluetkeb commented 5 years ago

Tagging @wjwwood @mjcarroll @nburek for feedback.

nburek commented 5 years ago

@iluetkeb Thanks for starting the discussion on making the scheduling more deterministic and for the details in the paper.

One clarifying question, is the execution of callbacks out of order for a single subscription on a topic or are they just out of order across multiple topics/subscriptions?

To me it makes more sense for the RMW layer to be providing the information needed for the rcl/rclcpp layers to be making the scheduling decision as opposed to allowing the rmw to fully dictate which are the highest priority. If you're entrusting the rmw layer to make decisions about priority instead of providing just metadata then it becomes harder to make behavior guarantees in rcl/rclcpp because it depends on the implementation and if you're going to dictate every rmw layer implement something the same way say that it is consistent then it makes sense to me to just move that up the stack.

Do you have a separate design issue related to the changes that would be made in rcl/rclcpp to enable the deterministic scheduling? Have you considered alternatives to arrival priority scheduling? It's not unreasonable to me to imagine that ROS will eventually support two or more schedules, one being a more fair and one being a more hard realtime that would starve some topics to guarantee others. As the paper points out, the priority inversion caused by the way the current design polls for information and then handles it all the events before polling for more would also need to be solved to be more deterministic in other scenarios. I bring all this up because I think it's worth thinking now about the long term requirements needed for rcl/rclcpp to have different schedulers and how to enable all these future use cases now. That may be as simple as providing the additional metadata, like arrival time, when you poll for messages or it may require some other changes to how events are polled.

tobiasblass commented 5 years ago

@iluetkeb Thanks for starting the discussion on making the scheduling more deterministic and for the details in the paper.

One clarifying question, is the execution of callbacks out of order for a single subscription on a topic or are they just out of order across multiple topics/subscriptions?

The paper discusses the ordering of callbacks across multiple topics/services. Within a single topic, the paper assumes that messages are processed in arrival order (which, to the best of my knowledge, is the default setting for the DDS middleware)

To me it makes more sense for the RMW layer to be providing the information needed for the rcl/rclcpp layers to be making the scheduling decision as opposed to allowing the rmw to fully dictate which are the highest priority. If you're entrusting the rmw layer to make decisions about priority instead of providing just metadata then it becomes harder to make behavior guarantees in rcl/rclcpp because it depends on the implementation and if you're going to dictate every rmw layer implement something the same way say that it is consistent then it makes sense to me to just move that up the stack.

I definitely agree. Generally, allowing more experimentation and domain-specific adjustments is a good thing. The main question is how expensive this is in terms of performance, and how this interacts with DDS QoS policies. The most straightforward way to implement rcl-level scheduling policies, for example, is to move all messages from the DDS to the rcl layer as soon as possible. However, this means that the messages are considered to be successfully delivered by the DDS middleware, which breaks some DDS QoS policies (e.g., deadline enforcement). On the other hand, if we leave all the messages at the DDS layer and copy them over for scheduling, we doubled the memory consumption. The holy grail is to find some middle ground between these two extreme options, where we get enough information about the available messages without breaking DDS QoS options.

Do you have a separate design issue related to the changes that would be made in rcl/rclcpp to enable the deterministic scheduling? Have you considered alternatives to arrival priority scheduling? It's not unreasonable to me to imagine that ROS will eventually support two or more schedules, one being a more fair and one being a more hard realtime that would starve some topics to guarantee others. As the paper points out, the priority inversion caused by the way the current design polls for information and then handles it all the events before polling for more would also need to be solved to be more deterministic in other scenarios. I bring all this up because I think it's worth thinking now about the long term requirements needed for rcl/rclcpp to have different schedulers and how to enable all these future use cases now. That may be as simple as providing the additional metadata, like arrival time, when you poll for messages or it may require some other changes to how events are polled.

It is definitely possible that we end up with multiple schedulers, although I still hope that we are able to implement an efficient earliest-deadline-first scheduler (which would give us a single scheduler that at the same time provides strong real-time guarantees for users who take the time to configure it while behaving just like FIFO in the default state). Independent of whether that works out or not, I'm pretty sure we're going to want some kind of arrival-based scheduler if at all possible, just because they are easy to understand, are what people probably expect from the outset, and are starvation-free. I think it is very valuable to have at least a prototype implementation of that as soon as possible, even if the long-term requirements are not clear yet.

iluetkeb commented 5 years ago

As a preface let me say that, all other things being equal, I generally prefer to put functionality in the place where the necessary information is.

In this case, the middleware has all the information, so it would be fairly natural to do that. btw, please note that this includes more than just timestamp: The middleware also has QoS settings and it has information about where a message came from.

However, I get the feeling that you guys think that executor is something we have under control, but that we don't have the middleware under control, and that is why you want to put it into the executor? Is that right? If yes, please read on below.

To me it makes more sense for the RMW layer to be providing the information needed for the rcl/rclcpp layers to be making the scheduling decision as opposed to allowing the rmw to fully dictate which are the highest priority. If you're entrusting the rmw layer to make decisions about priority instead of providing just metadata then it becomes harder to make behavior guarantees in rcl/rclcpp because it depends on the implementation and if you're going to dictate every rmw layer implement something the same way say that it is consistent then it makes sense to me to just move that up the stack.

I think I see your point, but the middleware can always make decisions in a way we cannot override, through QoS settings. It will drop messages when buffers are full, it will prioritize messages when QoS says so, etc.

One of my goals is to keep DDS QoS mechanisms working. If that wasn't a goal, then the easiest way to get a deterministic executor would be to have a thread that just polls the middleware as fast as it can and put_back everything into a shared queue, for a single thread to pop from. That is basically how execution works in ROS 1.

If we accept this in general, then it might not be such a bad idea to have the middleware do the prioritization, because we can influence how it does it through QoS settings. Also, please note that we can always choose to ignore the ordering suggested by the middleware.

Of course, it could be that QoS settings are too complex, or not powerful enough to do what we need. At the moment, I can't really judge that, because I have not looked at them in detail, and I even more importantly, I don't have all the use cases in mind. If we want to do arrival ordering, it would be trivial, of course. If you have other things in mind, please let me know.

What I have looked at in detail is rmw_fastrtps and the relevant bits of Fast-RTPS, and they make me lean towards option 2.

In any case, I would think that the executor can still achieve some measure of control by waiting on just some of the communication objects, instead of all of them.

Do you have a separate design issue related to the changes that would be made in rcl/rclcpp to enable the deterministic scheduling?

Not yet. However, I think it can be fairly easily implemented by having a new memory_strategy with the current executor, and by treating timers differently. The main part seems to be getting the info.

Have you considered alternatives to arrival priority scheduling? It's not unreasonable to me to imagine that ROS will eventually support two or more schedules, one being a more fair and one being a more hard realtime that would starve some topics to guarantee others.

Well, using topics with differing priorities in the same executor could probably be called a mixed-criticality setting (@tobiasblass correct me if I get the terms wrong...)

This is in general hard to get right. For the moment, I would be happy with real-time-safe nodes at all, and assuming that they only consume information of (roughly) equal importance. This would basically partition the system into real-time and non-real-time parts.

btw, some reasons why mixed criticality is hard:

you also need to ensure that delivery and reception of important messages is prioritized over less important messages. Otherwise it could happen (and does happen) that your important message is just getting stuck in some buffer while the unimportant bulk data transfer is hogging the line...
we might even need pre-emption, because when an important message comes in while we are busy processing some other data, we might want to execute it immediately.

As the paper points out, the priority inversion caused by the way the current design polls for information and then handles it all the events before polling for more would also need to be solved to be more deterministic in other scenarios.

Absolutely agreed. At the moment, the only thing I can think of to address this is that we have to ask the middleware for an update after every callback.

Unfortunately, in the current implementation of both the executor and the rmw-implementation, this approach is prohibitively expensive, because the overhead is so huge.

My ideal approach would be that the rmw-implementation does not have to completely re-create the wait_set on every call to rmw_wait, but that it could just update it incrementally with things that have changed since the last call.

tobiasblass commented 5 years ago

However, I get the feeling that you guys think that executor is something we have under control, but that we don't have the middleware under control, and that is why you want to put it into the executor? Is that right? If yes, please read on below.

I think the main issue is that ROS 2 supports many different middlewares, so it's hard to make assumptions about what the middleware supports. Anything that is part of the executor is the same across all ROS 2 configurations.

To me it makes more sense for the RMW layer to be providing the information needed for the rcl/rclcpp layers to be making the scheduling decision as opposed to allowing the rmw to fully dictate which are the highest priority. If you're entrusting the rmw layer to make decisions about priority instead of providing just metadata then it becomes harder to make behavior guarantees in rcl/rclcpp because it depends on the implementation and if you're going to dictate every rmw layer implement something the same way say that it is consistent then it makes sense to me to just move that up the stack.

I think I see your point, but the middleware can always make decisions in a way we cannot override, through QoS settings. It will drop messages when buffers are full, it will prioritize messages when QoS says so, etc.

I agree with you, but let me point out that this is not necessarily about QoS settings but about QoS settings that are provided by the user without ROS knowing about it. It is at least conceivable that a deterministic executor could pick the appropriate middleware QoS settings, which would not suffer from this problem. Of course, if we want allow users to pick any QoS settings they like, I fully agree with your argument.

Have you considered alternatives to arrival priority scheduling? It's not unreasonable to me to imagine that ROS will eventually support two or more schedules, one being a more fair and one being a more hard realtime that would starve some topics to guarantee others.

Well, using topics with differing priorities in the same executor could probably be called a mixed-criticality setting (@tobiasblass correct me if I get the terms wrong...)

This is in general hard to get right. For the moment, I would be happy with real-time-safe nodes at all, and assuming that they only consume information of (roughly) equal importance. This would basically partition the system into real-time and non-real-time parts.

At least in the real-time community, mixed-criticality is a fixed term for something different (roughly speaking, this is about changing your workload and scheduling parameters to a safer state with less features if you notice that, e.g., some task overruns its expected execution time). What you are describing would just be called real-time scheduling on the executor level. However, I think the point that @nburek was making is that you could choose one of these schedulers per executor, so you would still have the separation. Please correct me if I'm wrong, though :-).

btw, some reasons why [real-time scheduling] is hard:

you also need to ensure that delivery and reception of important messages is prioritized over less important messages. Otherwise it could happen (and does happen) that your important message is just getting stuck in some buffer while the unimportant bulk data transfer is hogging the line...

True, although this problem does not appear on the executor but on the process level. As far as I know, the inbound and outbound queues are shared between all executors in the same process. One way to deal with that would be to give the middleware communication threads the highest priority, or pin them to a different CPU core than your ROS executors. This is definitely out of scope for this ticket, though :-).

we might even need pre-emption, because when an important message comes in while we are busy processing some other data, we might want to execute it immediately.

I think that is a separate decision. It is in some sense more difficult than in the preemptive case, but people have successfully built real-time systems based on non-preemptive schedulers. The main difficulty is that your longest-running callback determines the reaction time of everything else, but if you account for that this works just fine.

iluetkeb commented 5 years ago

I think the main issue is that ROS 2 supports many different middlewares, so it's hard to make assumptions about what the middleware supports. Anything that is part of the executor is the same across all ROS 2 configurations.

Sure. But expecting the middleware to be able to sort messages in arrival order is not asking much, I would say.

but let me point out that this is not necessarily about QoS settings but about QoS settings that are provided by the user without ROS knowing about it.

Well, I think I mean something slightly different. What I mean is that we're already using the middleware to make some decisions for us. For example, when configuring queue sizes -- that decision has an impact on scheduling. The middleware is the layer that implements this choice.

At least in the real-time community, mixed-criticality is a fixed term for something different (roughly speaking, this is about changing your workload and scheduling parameters to a safer state with less features if you notice that, e.g., some task overruns its expected execution time).

I'll check that.

In any case, I would certainly like to consider this case, but it goes far beyond what we can achieve simply by looking at the rmw API.

It is in some sense more difficult than in the preemptive case, but people have successfully built real-time systems based on non-preemptive schedulers.

This requires guarantees on all callbacks, though, right? If we have a system where messages on some topics have deadlines, but messages on other topics don't, it might be that we cannot give this guarantee either. In that case, I would suggest partitioning differently.

nburek commented 5 years ago

As a preface let me say that, all other things being equal, I generally prefer to put functionality in the place where the necessary information is.

In this case, the middleware has all the information, so it would be fairly natural to do that. btw, please note that this includes more than just timestamp: The middleware also has QoS settings and it has information about where a message came from.

I would agree with you in the case where it makes sense for the component with the information to be making those evaluations. In this case, it feels like we are pushing too much of the business logic of scheduling execution down into the messaging middleware layer.

However, I get the feeling that you guys think that executor is something we have under control, but that we don't have the middleware under control, and that is why you want to put it into the executor? Is that right? If yes, please read on below.

From the perspective of someone writing a node that will be used by other people (think MoveIt) the only guarantees that you have are those provided by the rcl/rclcpp layer that you're programming against (and by translation the guarantees provided to them by the rmw api). Right now the scheduling is handled in rclcpp such that they get some level of guarantees to behavior regardless of which which rmw implementation is loaded by the end user at runtime. Right now nothing in the rmw api enforces a contract for how anything is scheduled. So if we do go down the route of allowing the rmw layer to dictate scheduling priorities I personally would want to see it defined in the API contract. However, doing that starts to add a lot of burden on the rmw maintainers to implement all the supported priority management. It will also lead to a lot of duplication of logic in every rmw implementation (of which I think there are at least 4-5 that people actively worked on in the last 6 months, two of which aren't DDS based).

The reason I would push toward having the business logic of the scheduling in the rcl/rclcpp layer would be because it would continue to give some level of guarantee to the node authors while not pushing a lot of additional burden down into the rmw layers. In reality, the final solution might be some combination of the two.

Unfortunately, in the current implementation of both the executor and the rmw-implementation, this approach is prohibitively expensive, because the overhead is so huge.

My ideal approach would be that the rmw-implementation does not have to completely re-create the wait_set on every call to rmw_wait, but that it could just update it incrementally with things that have changed since the last call.

The reason I was asking if you had thought about any alternative scheduling policies besides just first arrival of all messages was because I was wondering if we can make a case for improving the rmw interface. If we have a set of requirements for schedulers in general and the current rmw layer isn't meeting them or is two expensive then I think we can evaluate if there are changes that would not only enable arrival based scheduling, which I totally agree is a very valuable first step, but also make it possible to write other scheduling policies. It's hard to know we're making the right decision on changing the rmw layer with future proofing in mind if we don't think a little more broadly.

However, I think the point that @nburek was making is that you could choose one of these schedulers per executor, so you would still have the separation. Please correct me if I'm wrong, though :-).

Yes, I was thinking about different executors in different processes (or entire systems) wanting different types of scheduling, but I suppose it would also be possible to have a mixture of executors with different policies in the same process. Though I don't know if it's a real use case to have multiple executors working on a shared object, so I don't think we need to worry about those complexities.

Sure. But expecting the middleware to be able to sort messages in arrival order is not asking much, I would say.

I don't think it is, but nothing in the existing API contract enforces it and so you will start to have greater disparity between RMW layers.

iluetkeb commented 5 years ago

@nburek I see your points, and I generally agree, particularly about the need for explicit contracts.

What might be a relevant observation here, is that just providing timestamps to the upper layers is already non-trivial, and also puts expectations on the middleware. At least that's the impression I got from reading through rmw_fastrtps and Fast-RTPS source code.

Firstly, it requires reading more data than is currently done before taking the message. Secondly, what we really want is a source timestamp, which DDS has, but not all middlewares do. Therefore, telling the middleware "give me N messages in arrival order", might actually be easier for the rmw-implementors than having them provide timestamps for all currently available messages. It would also be a very explicit contract.

But I agree that it is very specific contract as well, and that we would do well to consider other use cases before embarking on changing the rmw interface.

The reason I was asking if you had thought about any alternative scheduling policies besides just first arrival of all messages was because I was wondering if we can make a case for improving the rmw interface. If we have a set of requirements for schedulers in general and the current rmw layer isn't meeting them or is to expensive then I think we can evaluate if there are changes that would not only enable arrival based scheduling, which I totally agree is a very valuable first step, but also make it possible to write other scheduling policies.

As @tobiasblass said, a major scheduling approach we are looking at "earliest deadline first" (EDF) scheduling.

btw, note that DDS also has a concept of a deadline, which can be set in both the Writer and the Reader. At the moment I'm not sure whether this is actually used for prioritization, or only to remove data that has expired, but it is certainly useful to consider information provided by the sender as well as the reader.

Anyway, a simple approach would be define the deadline as "source timestamp + acceptable delay" (which may be different per topic, thus giving us prioritization). In case we don't have a source timestamp, we would substitute arrival time, and then end up with arrival-based ordering.

Of course, incoming data is not the only source of work. We also need to look at periodic or explicitly invoked callbacks from within the process. This should fit into that approach as well.

The second major approach, which we have already implemented, is static scheduling. That's simple, and has very little requirements on the rmw interface.

Regarding the rmw API in general: I think that right now, the rmw_wait call is very inefficiently implemented at least by rmw_fastrtps, and so are the interface-related parts of the executor and its memory_strategies. It is probably to improve upon that within the constraints of the current API, but I also have some ideas on a better information exchange interface. That is future work, though.

iluetkeb commented 5 years ago

btw, I asked Arne about mixed-criticality, and he explained that it's about a mixture of Safety Integrity Levels on the same device. This has some connection to scheduling, of course, but its origin is elsewhere.

gbiggs commented 5 years ago

Mixed-criticality is fun because you have to proove that the lower SIL aspects are not going to interfere with the higher SIL aspects. Things like not having priority inversions, not allowing a lower SIL process to suck up all the CPU time or memory, no deadlocks, etc.

Hypervisors and partitioning are often used for this sort of thing.

I think in ROS you would be unlikely to put different safety levels in a single node. It would make your safety-critical aspects harder to prove correct. My feeling is that you would want to separate out your safety-critical architecture from the rest of your application. However you still need a way to interact between the two so having priorities within a single node may still be necessary.

deeplearningrobotics commented 5 years ago

We want to execute messages in importance order, which -- in the absence of other priorities -- is usually message arrival order.

Priorities should be configurable but furthermore currently executed callbacks need to be interrupted should a higher priority task come in.

We ask the middleware for timestamps on all waiting messages and perform ordering on the executor level.

With DDS read and take this is already provided on the middleware layer.

DDS allows ordering by reception, sender or custom timestamp (destination order QOS), where a custom timestamp usually would be the sensor data timestamp or similar. Note that having a custom timestamp is the only deterministic option that leads to same outcome on the same data set in successive runs of an application.

On the other hand, if we leave all the messages at the DDS layer and copy them over for scheduling, we doubled the memory consumption.

I think copying messages is a no-go for most modern robotics applications such as autonomous driving.

This is in general hard to get right. For the moment, I would be happy with real-time-safe nodes at all, and assuming that they only consume information of (roughly) equal importance. This would basically partition the system into real-time and non-real-time parts.

We implement real-time nodes using read and take which will is also currently being discussed and hopefully added to the next ROS 2 release (#256).

See the following diagram on how we do this in detail:

The above architecture allows:

To give tasks/nodes different priorities.
To interrupt a lower priority task for a higher one.
Full control over data ordering and further allows us to chose which topics to handle first.
Does not allocate, does not copy any data unnecessarily if the middleware itself does not allocate.
Reuse data if desired.

I think implementing a true real-time executor in C++ currently fails on the fact that function can not be suspended externally (even C++20 coroutines can only yield from inside the function itself AFAIK).

Unfortunately, in the current implementation of both the executor and the RMW implementation, this approach is prohibitively expensive, because the overhead is so huge.

Fully agree!

My ideal approach would be that the RMW implementation does not have to completely re-create the wait_set on every call to rmw_wait, but that it could just update it incrementally with things that have changed since the last call.

In our internal RMW implementation, we do exactly that and only apply diffs to our waitset. Still, the current design of RMW, especially the waitset forces such an inefficient implementation that we are forced to sidestep the RMW abstraction to achieve satisfying performance.

iluetkeb commented 5 years ago

@deeplearningrobotics sorry for the late reply, we've had a holiday here in Germany.

Priorities should be configurable but furthermore currently executed callbacks need to be interrupted should a higher priority task come in.

I see two answers to this: One is, as @tobiasblass mentioned above, that you can give RT guarantees without preemption. They might not be as tight, but still. The other answer is that you can already have pre-emption by using multiple executors.

However, in both cases, we have a problem, and that is that we don't have deadline information right now. RMW's current API doesn't give that information to us. That's mainly what this issue is about.

We ask the middleware for timestamps on all waiting messages and perform ordering on the executor level.

With DDS read and take this is already provided on the middleware layer.

Yeah, but not by RMW. In any case, if we use the middleware functionality for this (which I like), that would be option 2 of my original list. If we pass on deadlines to the higher layers, that would be option 1.

We're currently trying to decide which option is better. Any input on that would be much appreciated.

We implement real-time nodes using read and take which will is also currently being discussed and hopefully added to the next ROS 2 release (#256).

I wasn't aware of #256 so far, thanks for the link. However, as far as I can discern, that mainly targets zero-copy. It does not seem to address how to figure out whether, and if yes how much, data is available. This is what rmw_wait is currently used for, but its API is problematic.

In our internal RMW implementation, we do exactly that and only apply diffs to our waitset. Still, the current design of RMW, especially the waitset forces such an inefficient implementation that we are forced to sidestep the RMW abstraction to achieve satisfying performance.

Exactly! That's what we're trying to improve here.

Since you seem to have done this already: Do you have performance data on the current approach vs. yours that you'd be willing to share?

wjwwood commented 5 years ago

Hey guys, sorry for not reply here, but I have been following it with interest. I hope to discuss a lot of this with you guys at ROSCon and then follow up here, but in the meantime I'll just make a few comments on what I re-read today.

Sure. But expecting the middleware to be able to sort messages in arrival order is not asking much, I would say.

I actually think this is non-trivial. Even in DDS ordering them across multiple subscriptions will require reading the data for the sample info's and manually tracking which you should take from next. Unless you use a Listener, but that's more or less equivalent an Executor in that you don't have much control on ordering among multiple data readers. But I could be wrong.

At the moment I'm not sure whether this is actually used for prioritization, or only to remove data that has expired, but it is certainly useful to consider information provided by the sender as well as the reader.

I think it is only used to notify the user when data was not received on time. The QoS "lifespan" is used to remove data that is "too old" usually. Neither control presentation order as far as I know, but that is a separate QoS in DDS.

Priorities should be configurable but furthermore currently executed callbacks need to be interrupted should a higher priority task come in.

I think this was addressed, but I'll just re-iterate that there's no way to do this without the code to be preempted offering preemption points periodically. (for example see jthread where they use the phrase cooperatively interruptible: https://medium.com/@vgasparyan1995/a-new-thread-in-c-20-jthread-ebd121ae8906)

mm318 commented 5 years ago

It looks like from the discussion so far, the callback execution order would generally be scheduled based on a timestamp and a explicit priority of a topic/subscription that would be specified by the user. We shouldn't forget that the callbacks of timers (rcl_timer_t) and QoS events (rmw_events_t) should also be part of execution order calculation. Therefore, the data required for determining callback execution order is coming from all the different layers: timestamps for messages and QoS events are coming from the middleware/rmw_implementation, timestamps for timers are coming from rcl, and the explicit priorities are coming from rclcpp since they are user specified.

Given this, I think it makes more sense to bubble the timestamps up to the rclcpp layer (option 1, and echoing what @nburek was saying) than to pass everything needed down to the rmw layer for rmw_wait() to make a scheduling decision (option 2), because for one the rmw layer is just completely unaware of timers (unless we introduce a rmw_timer_t).

Regarding the discussion for option 1, I think the data size for keeping track of message timestamps would be bounded by the depth of the history QoS policy (which could mean unbounded). rmw_wait() can be modified to indicate the number of times rcl_take() can be called on each member of the rmw_wait_set_t (let's call them Waitables ;) ), and the output of each rcl_take() call will provide timestamp information to the rclcpp layer. In effect, this is starting to sound like option 1 with a sprinkle of option 2.

mm318 commented 5 years ago

I thought preemption is a bit out of scope for this issue, but anyway...

There is probably no way to interrupt a callback that is currently executing, but we can at least rearrange the callback execution order while a callback is executing and update the next callback to be executed before the current callbacks are finished.

iluetkeb commented 5 years ago

It looks like from the discussion so far, the callback execution order would generally be scheduled based on a timestamp and a explicit priority of a topic/subscription that would be specified by the user.

While this is a common approach, it is not as easy to get right as one might think. Therefore, we're favoring a deadline-based approach instead of a priority-based one. btw, I just learned at ROSCon that our notion of a deadline is different from the DDS one. We consider a deadline to be the latest time by which the data has to be handled by a user callback.

Nevertheless, the deadline can also be user-specified information, so I agree with your general argument that the ordering will have to take information from various sources into account.

That said, I am increasingly coming to the view that my original question raised in this ticket cannot be answered without a more fine-grained design for both options, where we have considered all the implications fully.

mm318 commented 5 years ago

Therefore, we're favoring a deadline-based approach instead of a priority-based one.

Okay, I see. That just means that the callback execution will be ordered based on a timestamp + user_deadline criteria (instead of being based on some function f(timestamp, user_priority) that may be unintuitive), right? Or is there some special treatment for when a deadline cannot be met?

btw, I just learned at ROSCon that our notion of a deadline is different from the DDS one.

Right. Your notion of deadline sounds more like the latency_budget DDS QoS policy, but according to the DDS spec it is also just a hint to the middleware and doesn't sound like there's any guarantee it will even have any effect at all. (Or like the lifespan QoS policy, if the special treatment for when a callback execution deadline cannot be met is to drop the execution altogether.)

iluetkeb commented 5 years ago

That just means that the callback execution will be ordered based on a timestamp + user_deadline criteria

In principle, it should be timestamp + deadline - worst-case-execution-time, such that the reaction is available by the deadline. Note that WCET is almost impossible to know exactly, so many people use the average execution time instead.

Your notion of deadline sounds more like the latency_budget DDS QoS policy, but according to the DDS spec it is also just a hint to the middleware and doesn't sound like there's any guarantee it will even have any effect at all.

Yes, the definition of that sounds right. btw, I would expect that a real-time capable middleware implementation, when running on a real-time capable transport, could use this information to configure the transport appropriately.

ralph-lange commented 4 years ago

See @iluetkeb's PRs https://github.com/ros2/rcl/pull/591 and https://github.com/ros2/rmw/pull/200/

rotu commented 4 years ago

I strongly object to the changes in https://github.com/ros2/rmw/pull/200, https://github.com/ros2/rcl/pull/591

This attempts to rewrite parts of the DDS DCPS standard to fix our misinterpretation of it. (1) The SampleInfo already contains a source timestamp, which we completely ignore. (2) Waitsets do not maintain the timestamp of certain events and this introduces ambiguity. If the status changes but still matches the StatusCondition mask, what is the correct timestamp? (3) If it is important to run an action when an event happens as opposed to waiting, there is already a mechanism to do so in DCPS: listeners.
This doesn't even solve the problem: (1) Recreating waitsets every time the user calls rmw_wait is measurably large source of overhead, largely due to the amount of dynamic memory management. This overhead contributes to the need for sequencing. Adding several new heap-allocated dynamic-length collections is going to make it slower. (2) A compliant implementation will still have a useless timestamp: the ReadCondition should stay true while samples are available (not become true on each incoming sample), so the timestamp wouldn't even allow proper sequencing of messages.
There is a more obvious solution available. We've already added a destination timestamp to rmw_message_info_t, but the code is commented out https://github.com/ros2/rmw/blob/2f954d77a102042fa5c13540e2db300d549d7ad3/rmw/include/rmw/types.h#L335. If that's uncommented, it is a simple matter to use in rcl/rclcpp: pop available messages to a priority queue and have the executor process them in first-to-last order. Plus we don't need to spuriously create multiple wait sets when multiple messages come in fast succession.

iluetkeb commented 4 years ago

Hi @rotu, thank you for your comments. I'm sorry you strongly object, but I hope I can address your concerns, since we have actually considered many of the same points in our design discussions.

This attempts to rewrite parts of the DDS DCPS standard to fix our misinterpretation of it.

I'm not sure what this refers to, but not reinventing what is provided by DDS already has been a major concern during the design.

(1) The SampleInfo already contains a source timestamp, which we completely ignore.

We do not ignore it. In fact the SampleInfo is used by my reference implementation, https://github.com/ros2/rmw_fastrtps/pull/359, for all types (subscriptions, clients and services) and we can switch to that as soon as Fast-RTPS 1.10 is included.

Until then, the version for Fast-RTPS 1.9.x, https://github.com/ros2/rmw_fastrtps/pull/363, uses SampleInfo for clients and services. It does not yet use it for subscriptions, because a method is lacking that allows us to retrieve SampleInfo without read or take (using either of those would change semantics). For clients and services, they are always taken immediately, so we can avoid this.

btw, the version for 1.10 does not use the source timestamp anymore, but the newly introduced receive timestamp, also from the SampleInfo. We have discussed which timestamp is more appropriate and currently believe that the receive timestamp is more appropriate. If you object, we can discuss this, of course.

(2) Waitsets do not maintain the timestamp of certain events and this introduces ambiguity. If the status changes but still matches the StatusCondition mask, what is the correct timestamp?

I'm not sure which ambiguity you refer to. In any case, the timestamp we use is always the one that matches the time the message has been received originally. As said above, this is retrieved from the SampleInfo.

(3) If it is important to run an action when an event happens as opposed to waiting, there is already a mechanism to do so in DCPS: listeners.

Agreed. It is not the intention to run an event as soon as things are happening. What we aim for is that, if you have messages on multiple sources available at the same time, that their arrival order is available for consideration by the executor.

2 This doesn't even solve the problem: 1) Recreating waitsets every time the user calls rmw_wait is measurably large source of overhead, largely due to the amount of dynamic memory management.

I share the view that the current execution pipeline has many inefficiencies, and am actively involved in efforts to change this.

However, I believe this PR does not make the situation worse. Yes, it allocates more memory to hold the timestamps, but this memory can be allocated once, just like the rest of the wait_set. Nothing in the rcl_wait API requires that you dynamically re-allocate it, unless the size of the wait-set changes (which does not happen often in most applications). It is possible that some executors do this, but that can (and should) be changed independently. Lastly, in comparison to how much memory we spend in other places, I think the list of timestamps is small fry.

(2) A compliant implementation will still have a useless timestamp: the ReadCondition should stay true while samples are available (not become true on each incoming sample), so the timestamp wouldn't even allow proper sequencing of messages.

I don't understand this part. Maybe you can look at what I wrote above, regarding our use of the SampleInfo and the reference implementation, and tell me if this comment is still valid?

There is a more obvious solution available. We've already added a destination timestamp to rmw_message_info_t, but the code is commented out

This information is only available after you take the message. However, we want to take messages only when we actually intend to process them immediately, because otherwise we would interfere with the history QoS. This is one of those things were we have not opted for the "simple" solution, precisely because we don't want to reimplement what DDS already provides.

That means, we need the timestamp first, and then we can decide what to take. That is important because it may well be that by the time we're done processing the selected message, the one that would have been second has already been replaced by new data.

rotu commented 4 years ago

Hi @rotu, thank you for your comments. I'm sorry you strongly object, but I hope I can address your concerns, since we have actually considered many of the same points in our design discussions.

This attempts to rewrite parts of the DDS DCPS standard to fix our misinterpretation of it.

I'm not sure what this refers to, but not reinventing what is provided by DDS already has been a major concern during the design.

DDS already has a concept of waitset which the rmw_wait_set_t is getting further away from with this design. In DDS the wait set has the following actions: attach_condition, detach_condition, wait, and get_conditions. By adding a timestamp, you force some weird design decisions on the RMW.

An entity in a DDS wait set becomes "ready" when its status matches some given mask. This change means that when processing the wait set, the application now has to not simply wait for the status flag to be true, it has to look into that sample and expose information that should be available through other means. Worse, if there is any message history, this forces you to call rmw_wait on the waitset multiple times to get the associated timestamps.

(1) The SampleInfo already contains a source timestamp, which we completely ignore.

We do not ignore it. In fact the SampleInfo is used by my reference implementation, ros2/rmw_fastrtps#359, for all types (subscriptions, clients and services) and we can switch to that as soon as Fast-RTPS 1.10 is included.

Until then, the version for Fast-RTPS 1.9.x, ros2/rmw_fastrtps#363, uses SampleInfo for clients and services. It does not yet use it for subscriptions, because a method is lacking that allows us to retrieve SampleInfo without read or take (using either of those would change semantics). For clients and services, they are always taken immediately, so we can avoid this.

btw, the version for 1.10 does not use the source timestamp anymore, but the newly introduced receive timestamp, also from the SampleInfo. We have discussed which timestamp is more appropriate and currently believe that the receive timestamp is more appropriate. If you object, we can discuss this, of course.

I mean we don't expose the DDS SampleInfo source_timestamp, which is part of the DDS spec. The reception timestamp totally makes sense too as an extension of the spec, and I agree with using it.

In the DDS spec, read should already expose this sample info. So it would be much more compliant with the spec to keep it with the sample instead of stuffing it into the waitset.

(2) Waitsets do not maintain the timestamp of certain events and this introduces ambiguity. If the status changes but still matches the StatusCondition mask, what is the correct timestamp?

I'm not sure which ambiguity you refer to. In any case, the timestamp we use is always the one that matches the time the message has been received originally. As said above, this is retrieved from the SampleInfo.

The ambiguity I'm referring to is that the condition in the wait set is broader than what the timestamp represents. A subscription should be ready when the ReadCondition becomes true, but this means that the timestamp should now change when a sample is received or taken, even if the ReadCondition is unchanged.

(3) If it is important to run an action when an event happens as opposed to waiting, there is already a mechanism to do so in DCPS: listeners.

Agreed. It is not the intention to run an event as soon as things are happening. What we aim for is that, if you have messages on multiple sources available at the same time, that their arrival order is available for consideration by the executor.

2 This doesn't even solve the problem:

Recreating waitsets every time the user calls rmw_wait is measurably large source of overhead, largely due to the amount of dynamic memory management.

I share the view that the current execution pipeline has many inefficiencies, and am actively involved in efforts to change this. However, I believe this PR does not make the situation worse. Yes, it allocates more memory to hold the timestamps, but this memory can be allocated once, just like the rest of the wait_set. Nothing in the rcl_wait API requires that you dynamically re-allocate it, unless the size of the wait-set changes (which does not happen often in most applications). It is possible that some executors do this, but that can (and should) be changed independently. Lastly, in comparison to how much memory we spend in other places, I think the list of timestamps is small fry.

It can be made faster, but it's disingenuous to expect it will be. Executor::wait_for_work clears, resizes, and reallocates all dynamic data in the wait set every time. So in the short term, this is adding lots of memory allocation in a very active code path. This allocation/deallocation hit is doubled by this change.

(2) A compliant implementation will still have a useless timestamp: the ReadCondition should stay true while samples are available (not become true on each incoming sample), so the timestamp wouldn't even allow proper sequencing of messages.

I don't understand this part. Maybe you can look at what I wrote above, regarding our use of the SampleInfo and the reference implementation, and tell me if this comment is still valid?

Yes, still valid. If the timestamp is supposed to refer to when the ReadCondition in the waitset becomes true, it is the wrong timestamp. If it doesn't, the waitset is the wrong level of abstraction for it.

There is a more obvious solution available. We've already added a destination timestamp to rmw_message_info_t, but the code is commented out

This information is only available after you take the message. However, we want to take messages only when we actually intend to process them immediately, because otherwise we would interfere with the history QoS. This is one of those things were we have not opted for the "simple" solution, precisely because we don't want to reimplement what DDS already provides.

That means, we need the timestamp first, and then we can decide what to take. That is important because it may well be that by the time we're done processing the selected message, the one that would have been second has already been replaced by new data.

If single ticks of the executor are routinely picking up enough work to stall out the system long enough to evict history significantly, then no amount of prioritizing here will fix that; it might even be desirable to read data rather than drop it on the floor or else topics with shorter histories might never get work done if backlogs are serious.

If you need to borrow the data temporarily before seeing whether you want to actually deserialize and use it, you can use a DDS read or take operation. These operations don't require deserialization at the sime time.

iluetkeb commented 4 years ago

Okay, how would you do it, then? I mean concretely, how would you modify/extend the rcl API to expose the arrival timestamp, without having to call rcl_take first?

rotu commented 4 years ago

I think you have to call something like rcl_take if you want this to be consistent. You can use the existing rmw_take_serialized_message_with_info or rmw_take_loaned_message_with_info. To wrap this in an API, you would have add a new function like rcl_take_many which would take a list of rcl_subscription_ts and take all available messages from those subscriptions, put those messages in a list, sort that list by receive time, and return the list.

If you don't take, copy or otherwise take out some "lease" on the message from the RMW (through some yet undefined mechanism), then you run into issues like "this subscriber had the first message, but since then its history overflowed so now we're not reading that message". Or "this topic has a very stringent lifespan, so it is never the oldest message since any messages that old have been purged"

Architectural musings:

A waitset is a condition variable. rmw_wait has already forced the middleware away from this abstraction by not only waiting on a waitset, but by returning a collection (by entity type) of collections (by entity) of condition variables. What you want is a message queue. Instead of feeding the monster that is the current design, it would do well to change that design to fit the problem at hand: responding to events in some order. That message queue can live in the middleware (I like this idea) or in rcl and/or rclcpp.

dirk-thomas commented 4 years ago

Instead of feeding the monster that is the current design

@rotu please be careful how you communicate and try to contribute in a more constructive way.

rotu commented 4 years ago

Instead of feeding the monster that is the current design

@rotu please be careful how you communicate and try to contribute in a more constructive way.

I'm sorry if that was insensitive, @dirk-thomas. Here's what I should have said instead: The current design of rmw_wait has grown by accumulation instead of design and houses lots of technical debt which should not be piled upon.

First it took just subscriptions and guard conditions. That's not bad - it's a fairly minimal function. https://github.com/ros2/rmw/commit/e4f0ead775bad7017a20610b3f5114014451f21a

Then it acquired services and clients. This is where we should have started thinking about using some abstraction for an "awaitable thing", rather than the increasingly uncomfortable signature which now had 4 collections in it https://github.com/ros2/rmw/commit/59ee0a05bf29a38be27b8cf314d31715af0d8fd3

Here's where we introduced waitsets, but not in a way that uses them as a meaningful abstraction, since subscriptions, guard conditions, etc. are still along with instead of being members of the waitset. https://github.com/ros2/rmw/commit/9afe2a5d55800ea9c4ccc01c40bbaefecfb0e849

This was where things got bad and it became obvious that we're not using wait sets as wait sets anymore. By requiring middleware to not just respect the arrays passed in but also populate them as output parameter makes it so rmw_wait is now no longer compatible with the DDS WaitSet abstraction. https://github.com/ros2/rmw/commit/5d8b54cdf08e5ce16b1d5965730bedf35e5751bf

The intent of rmw_wait was to "wait until an actionable event happens". Instead it has become "return all sources of actionable events across many types of entity objects", and maybe soon "return all sources of actionable events together with a timestamp which the middleware needs to keep track of for things that may previously have not had one". We have thrust too much responsibility on this poor function and it's time to rethink it, not add yet more technical debt.

eboasson commented 4 years ago

Hi @iluetkeb and @rotu,

I can't stay silent, though perhaps I should ... I do see the argument for processing the "events" in a sensible order, but like @rotu, the proposed approach strikes me as going in the wrong direction. While I mostly agree with his objections to the proposed approach, "mostly agreeing" doesn't mean I really share the perspective in the details, so let me try to give mine.

The current rmw_wait operation is indeed an abuse of DDS and any proposal that builds upon it furthers that abuse. DDS' waitsets are designed to efficiently wait for any of a rarely changing set of conditions. And so all DDS implementations (excluding Fast-RTPS as it isn't so much a DDS implementation as an RTPS implementation) optimise the "wait" operation at the expense of the "attach" and "detach" operations. The design of the rmw_wait interface is just about the worst imaginable for this. Requiring it to return timestamps in alongside the triggering entities is a continuation of this abuse.

The current abuse adds a lot of overhead to the operation, but at least it doesn't require any fundamental changes to how DDS operates. Returning time stamps, however, is a change that goes deeper because the DDS interface doesn't have those available at that point (and implementations don't keep it). Providing that information in the result of rmw_wait therefore requires a DDS-based RMW layer to jump through hoops to make it work, or it requires forcing a change in the DDS interface.

The only source of those timestamps is, as mentioned, the information in the "sample info" returned by read/take. The existing DDS interfaces are not geared towards retrieving the sample info without also getting the data. And so, to stay within the existing interface, you will need to either do an extra "read" of the data (which touches flags, but those don't matter for how ROS2 operates today) or "take" it and cache it.

Today, where the ROS2 communication model is to just have a FIFO for each subscription, the data will indeed come out in order of reception. If you try to look into the future, key fields may well find their way into ROS2, as may the "by source timestamp" option. Then, the order in which data comes out of subsequent read/take operations on a reader becomes much less predictable. Indeed, two subsequent "read" operations without any new samples arriving in between need not even return the same one sample — and don't in some implementations, and for good reasons, too. (The tricks the "lifespan" QoS might play on you are also interesting but probably not that big an issue in practice.)

Another problem is that all you can count on today in DDS is the source timestamp. One might wonder why a reception timestamp is not generally available, and I think there are two reasons:

Often the time at which an event occurred is more relevant than the time at which the event is received. It requires synchronised clocks, but PTP generally does that to a sufficient degree. If timing is really critical, that's a reasonable condition.
You have to pick a meaning of "reception timestamp": the time of reception of the packet by the middleware (or perhaps the kernel?) that carried the first fragment, time of processing of that fragment, or rather the final fragment, or perhaps when it is received in full, or made available for insertion in the reader history, or insertion in the reader history itself. All these are perfectly valid but different notions of reception time. Clearly the source timestamp is a much better defined notion.

It seems to me that what you're really looking for is not the extension of the waitset interface with time stamps and the addition of a reception timestamp in the sample info. Rather, because you want the reception timestamp, it seems to me you would like to simply process data in the order of arrival (or something closely related to that). That suggests you'd be much better served with a (priority) queue that you can drain.

There is an interface for getting the samples out in order in DDS (with some limitations, but those are surmountable in the context of the current ROS2 feature set), and that is by setting the presentation QoS to "ordered access" with "group" access scope. It is an optional feature and not all implementations support it, but it is perfectly reasonable to rely on it for certain features and expect DDS implementors that care about ROS2 to support it. This will get you the reception order, but not the reception timestamps.

Equally, there is another fully supported way, if you're ok with using the time of invocation of the DATA_AVAILABLE listener as the "reception timestamp": use that listener to simply enqueue the reader id and the timestamp for each arriving sample. That builds on spec'd features of DDS that are universally available and give you both order and timestamp.

This can all be done in the context of the proposed interface, but I contend it would make more sense to overhaul the ROS2 interaction model and have a better fit with both DDS and the application need. Increasing the distance between the interaction models like in this proposal (in my view to a large extent driven by choices made in years past) brings with it an increasing cost at run-time and in development. That seems contrary to the long-term goals.

What then for the RMW layer? Fully implementing the current interface for a non-DDS middleware is already a tricky task (e.g., which other middleware has these crazy request-offered QoS semantics, or the exact same behaviour of deadlines and lifespan QoS?). Making it trickier for DDS implementations, too, weakens ROS2 by not having a really good fit with any middleware. In the long run, it would probably make more sense to assume a subset of DDS and build RCL directly on top of that, using the standard C++11 API that most DDS implementations support.

Where extensions to the API or to the interaction model are required, I'm sure that a convincing argument will cause the options to be studied carefully by the OMG DDS working group. I know the people that would be doing that studying and if anything, they take that work seriously. But I do imagine there might be some pushback, given that DDS has been doing just fine in control systems for a long, long time. There, typically, rather than looking at timestamps to decide in what order to do things, priorities are assigned to data and processing, and threads/processes are then used to ensure the highest priority always goes first. And I don't (yet) understand why that model doesn't work for robotics.

iluetkeb commented 4 years ago

Thanks to everybody for contributing. I do share some of @rotu's and others pain about the existing interface, and also I am not fully comfortable with my current extensions. The comments are a bit unfortunately timed, but better late than never, and I do hope we can use some of the extensions time due to Corona to finish this in a better way. You should know that for mapping to the middleware, I only investigated Fast-RTPS and there, what we wanted wasn't very hard to do. The rmw_fastrtps patch is very small.

btw, for all of those coming in late, please read through the beginning of this issue a little. Timestamps are just a means to an end, the end is deterministic execution.

To realize that, priorities etc. as mentioned by @eboasson certainly play a role, but they are already available on the application side, so we don't need to add them here. Event occurrence order also matters, and currently isn't available, so that's what we want to add.

It has been discussed whether we could leave ordering to the middleware, and people were not comfortable with that, for good reason I think. Thus, we ended up with trying to communicate timestamps upwards in some ways.

I'm also very willing to join a discussion on the other aspects, because I share the view that the current wait_set is difficult to work with efficiently. However, I'm afraid we cannot address those issues in the time left, so I propose postponing the other issues. I can tell you that performance ranks quite highly for me and my employer, so I'm pretty sure I'll get the time to work on that.

Regarding the whole "abuse" or "monster" discussion: That there is a difference in how wait_sets are defined in ROS 2 vs. DDS is, in itself, not a problem. What I and others are looking for is an interface that tells us the state of the communications layer. Currently, that interface is called rmw_wait. If (!) the interface is consistent and works well, but is badly named, let's change the name, not the interface.

So, information that the ROS 2 interface is hard to implement is very important. Information that it is different from DDS is informative, but in itself not decisive. Differences between ROS 2 and DDS are not always, but often on purpose.

Therefore, what does matter to me is whether the interface is a good one. And here, @rotu raises a very important issue: Starvation.

If you don't take, copy or otherwise take out some "lease" on the message from the RMW (through some yet undefined mechanism), then you run into issues like "this subscriber had the first message, but since then its history overflowed so now we're not reading that message". [...]

So far, we had been assuming a processing model inspired by the way ROS 1 does it:

ROS 1 put everything into an arrival-sorted queue, but it did not put the messages itself in the queue, it put the subscriptions into the queue. So, if you had messages for topics A, B, and C, the queue would be ABC. Within "A", "B", and "C" there would be another queue, of history length for that topic. When a topic came for execution, the first message from that queue would be taken and acted upon. This means that for the situation where B has queue size 1, and B receives a new message while A is being processed, this new message will be processed next, despite having arrived after the message that is in the queue for C.

btw, I think very few people were aware of this. At least, whenever I explained it, people were surprised. This doesn't mean its bad, but its certainly counter-intuitive.

However, the good aspect of it is that it both avoided starvation, and maintained the guarantee that for queue size 1, the most recent data would be used. For this,it sacrificed strict time-order.

If we interpret the wait_set, as currently proposed by https://github.com/ros2/rcl/pull/591 as a queue of subscriptions, and process it completely (instead of calling rcl_wait again after taking the first item), we get the same behavior as on ROS 1. This assumption has been the basis of the current design.

Now, if I understand @rotu correctly, he would like to solve the starvation problem in a different way, by essentially "snapshotting" the current state, and then executing the snapshot. In that approach, the older message would be used, despite newer data being available, and despite the fact that the user specified a small queue size.

On the positive site, that approach would be a strict time-ordered FIFO implementation, and thus not surprise people expecting that. To achieve this, it would not maintain the guarantee that queue_size 1 always leads to the newest message being used.

From this it should be clear that we will have to disappoint somebody. We cannot fulfil both expectations at the same time and still avoid starvation. The question is, which expectation is more important?

Having thought about this for a while, I tend to agree with @rotu on this one. Strict time-ordering should be chosen because it is actually achievable. In contrast, we can never fully guarantee that the latest data is going to be used, at least not when working reactively, because it could still be in transit.

When doing such "snapshotting", you would take all the messages, and while doing so, you receive all their SampleInfos, containing timestamps, and you could use that for ordering. We can think about optimizing this approach further by taking multiple messages at once, but in principle, for this part, you wouldn't need my changes at all.

Agree so far?

Well, unfortunately, there is a second problem: Some topics may have multiple messages waiting, and they may be in between (in time-order) the messages from the other topics.

This is also a problem with having just a single timestamp and processing the wait_set completely. The only solution, I think, is to take all the waiting messages. This is a challenge to implement efficiently, but if we only pass references, it should be doable. Something like @rotu's "take_many" suggestion above.

Let me ask you guys here: Would you be willing to contribute to an implementation next week? Because now would be the chance. If the OSRF gives us the time, it should be doable.

rotu commented 4 years ago

For just subscriptions, yes, I'd be willing and able to contribute an implementation next week if you provide the rmw function signature to implement. You may want to coordinate with https://github.com/ros2/rmw/pull/212 / https://github.com/ros2/rmw_cyclonedds/pull/148, which looks to solve a similar problem (though you would need a collection of subscriptions and the messages would need to be associated with the subscriptions so callbacks could be appropriately dispatched). If it has to sequence other event types like service requests, lifespan events, and others, I don't think I can do it in a week.

wjwwood commented 4 years ago

Let me start by summarizing what I think we should do and what things that were suggested that I think we should not do right now, as the tl;dr version, and then I'll try to comment on points through the discussion where I agree or maybe think it's not quite right.

First what I think we should do (for foxy):

expose the source timestamp in our message info
- I hesitantly suggest this as, while nice, it is non-trivial to implement if the middleware doesn't support it, DDS or not.
expose the received timestamp in our message info (as already proposed)
- https://github.com/ros2/rmw/pull/200#pullrequestreview-373921987
move forward adding timestamps for each condition in the wait set
- as @iluetkeb pointed out, this information is not a substantial additional burden from a memory or process point of view, unless you are reallocating the rmw/rcl wait set each time
- acquiring the information may be a different story, depending on the condition type, but here I just mean storing and presenting it
- as @rotu pointed out, re-allocation is happening every time in the current executor, but we are changing that as we speak
- as @rotu pointed out, we keep adding stuff to rmw_wait which is ugly, but I think this is the most efficient way to expose the information when it is needed, more on that below
- as @rotu pointed out, this timestamp is not exactly the "oldest received timestamp" for things that are not subscriptions or services, but rather the "condition became ready" timestamp which is not the same thing
make exposing the timestamp information an option to rmw_wait (so if you don't need it, you don't get it)
- this may be imperfect, as the middleware may need to keep the timestamp info even if the user doesn't ask for it in the wait set, but at least the wait set doesn't need to be populated with this information
- this may save a lot on middlewares where this information is only available via read() and the sample info

I think the above gives users the tools needed to use the timestamp information to meet their needs. They may either use the timestamps provided by the wait set, or instead query the information about each entity they are curious about separately, e.g. using the message info in take.

So, then the things I think we may want to do, but probably don't have time for in foxy (please don't let that stop you from trying, I'm just trying to be realistic):

doing a "snap shot" and executing everything available in batch
- the user can already do this, this is essentially what message filters can do, take everything, queue it again in "user space" (i.e. not in the middleware) and deliver it to the user as desired
- the LET executor used in rclc and developed for the micro-ros initiative also does something like this too, unless I'm misunderstanding it or your proposal somehow
- it has implications for QoS settings like history and lifespan, especially if processing the snap shot takes any length of time
supporting callbacks through the rmw API
- this would, in theory, allow the use of DDS Listeners directly through to the user's callback
- this has some complex implementation details to it, as we have considered it in the past...

It's a bit late so I'll save my additional commentary for later. Please let me know if you guys would like to do something else than what I have outlined above. I hope that making the additional information optional (in time and memory use), and the fact that we're trying to reduce or eliminate reallocation of the wait set by default, will make these already in flight changes more acceptable for you @rotu, even if it doesn't produce an API that you're fond of.

rotu commented 4 years ago

I suggest retargeting this redesign to Foxy+1. This design clearly doesn’t make principled consistency guarantees it wants to (reacting to incoming messages in sequential order). It’s adding to an already overfull Foxy docket and I’d rather do it right than have to clean up the wrong, rushed solution.

wjwwood commented 4 years ago

Ok, just to reiterate to check my understanding, the issue is that after getting the timestamps and selecting a subscription to take from, the state of the histories could change (disposing the oldest message we were about to take) and therefore we might take a newer message and therefore potentially take out of order, right?

With that in mind, is there any use case where having the timestamps is useful? My point above (which I did not elaborate on at the time) was that if the usage pattern was "wait on wait set, take oldest timestamp from all ready things, then decide what to take" then returning the timestamps with the wait set result could save a lot of shared library calls and perhaps could be optimized. You see similar things with "select variants" like epoll or kqueue, where they aggregate meta data for you and return them with the result of selecting.

Anyways, I don't mind if we take that part or not, but what about the source and "received" timestamps in the message info? That would be useful even to implement your "snap shot" or "batched" approach, right?

@iluetkeb what do you want to do?

rotu commented 4 years ago

@wjwwood That would allow an even simpler API than rmw_wait: the function would just have to return a single entity instead of all the ready entities (Though it still might be the case that the entity does not have any work ready, depending on the QoS).

wjwwood commented 4 years ago

That would allow an even simpler API than rmw_wait

Sorry, what is "that" in this context? I don't know which thing you are referring to.

rotu commented 4 years ago

sorry: "that" was implementing "wait on wait set, take oldest timestamp from all ready things, then decide what to take"; if we're only interested in the oldest timestamp of all ready things, then the API should pass the wait set and ask for just the next thing (which ideally would be the next sample not the subscriber itself)

wjwwood commented 4 years ago

Ok, I understand now. I was thinking each of those steps were happening in the user code or in rclcpp. Putting it into the rmw function would be selecting one scheduling strategy and implementing it there. Whereas if you give the tools to implement that and then do it in rclcpp, it allows others to come up with similar but slightly different scheduling algorithms for their needs. So I think that's preferable to having an rmw function that returns "next", where next is based on some implicit schedule choices, but it might be necessary to do that for the sake of performance or to guarantee atomic behavior between deciding and taking.

iluetkeb commented 4 years ago

Ok, just to reiterate to check my understanding, the issue is that after getting the timestamps and selecting a subscription to take from, the state of the histories could change (disposing the oldest message we were about to take) and therefore we might take a newer message and therefore potentially take out of order, right?

Exactly.

Anyways, I don't mind if we take that part or not, but what about the source and "received" timestamps in the message info? That would be useful even to implement your "snap shot" or "batched" approach, right?

Absolutely. Those timestamps would be required to make the take-based approach work, and I would suggest to add both source and received timestamp (where available, set to 0 otherwise).

We also need to take all messages, as explained above. Since we currently do not have an unread count available, one option would be to call rcl_take until it returns an error, assuming it is non-blocking. Another option would be to add an unread count to the wait_set. I don't have an opinion here -- what do the others think?

btw, a note on efficiency: If we want to get this information reliably, DDS offers read (which is non-destructive, i.e. keeps the queue) and take (which is destructive, i.e. removes the message from the queue). Of these, only take is currently (AFAICT) exposed through rmw and rcl. So, an executor currently would have to take all available messages to come to a scheduling decision, but that's okay, I would say.

However, in the current implementations, even with take_serialized, this copies all message data. take_loaned would avoid this, but is not currently implemented in any public implementation other than rmw_iceoryx.

For something like the SingleThreadedExecutor, which is going to execute everything anyway, the main effect would be to delay executing the first message until all messages have been copied.

Ideally, though, take_loaned would be required to make this really efficient. I cannot currently judge how much effort it would be for DDS implementations to support that mechanism. Could @JaimeMartin @miguelcompany @rotu @eboasson please comment? This effort should also be weighed in our design decisions.

And, just for reference, I do not think https://github.com/ros2/rmw/pull/212 is going to matter much for this, because it deserializes messages(at least that's the way I read its API description), which we do not want at this stage.

Whereas if you give the tools to implement that and then do it in rclcpp, it allows others to come up with similar but slightly different scheduling algorithms for their needs

Agreed, that's a major motivation for us as well.

wjwwood commented 4 years ago

Absolutely. Those timestamps would be required to make the take-based approach work, and I would suggest to add both source and received timestamp (where available, set to 0 otherwise).

Ok, unless someone objects I'll aim to get that part of it landed instead.

We also need to take all messages, as explained above. Since we currently do not have an unread count available, one option would be to call rcl_take until it returns an error, assuming it is non-blocking.

We're adding a take sequence method, so that could be used:

https://github.com/ros2/rmw/pull/212

Another option would be to add an unread count to the wait_set.

I don't think DDS provides that feature, though you could always read() and count those, but that would be very expensive. We might be able to extract this information anyways from Fast-RTPS and cyclone, but I think there's no way to do that efficiently with Connext IIRC from the last time we looked at doing it.

Of these, only take is currently (AFAICT) exposed through rmw and rcl. So, an executor currently would have to take all available messages to come to a scheduling decision, but that's okay, I would say.

We had planned to offer a read via rmw as well, but we prioritized take sequence instead.

wjwwood commented 4 years ago

And, just for reference, I do not think ros2/rmw#212 is going to matter much for this, because it deserializes messages(at least that's the way I read its API description), which we do not want at this stage.

It will be a copy because it needs to return the ROS type, but if the middleware can store only the serialized data and the ROS type (not the DDS type) as a sequence then it could be a loan like behavior.

Either way, it wouldn't be any more or less efficient than what take does now, except it only needs to be called once.

iluetkeb commented 4 years ago

Another option would be to add an unread count to the wait_set.

I don't think DDS provides that feature, though you could always read() and count those, but that would be very expensive.

Fast-RTPS has that already in the current listeners. But it would really suffer from the same problem as the timestamps, because things like deadline or lifespan could evict messages from the queue before being read.

Apropos deadline: At some point, it may be necessary to add those as well to the message-info, but let's get some experience with the current timestamps first before we undertake on that.

iluetkeb commented 4 years ago

https://github.com/ros2/rmw/pull/214, https://github.com/ros2/rcl/pull/619 and https://github.com/ros2/rmw_fastrtps/pull/368 together are an implementation of the new design.

Tests are currently missing (but I ran the existing tests, nothing breaks), and rclpy is missing as well. I'll get to those after dinner.

iluetkeb commented 4 years ago

OK, https://github.com/ros2/rclpy/pull/542 has the first attempt at Python support.

iluetkeb commented 4 years ago

@rotu are you going to be looking at rmw_cyclonedds for this, or do you know if somebody else is doing that?

rotu commented 4 years ago

I’m on it

iluetkeb commented 4 years ago

@rotu @wjwwood I just pushed some more changes related to clients and services. for that, I also added timestamps to rmw_request_id_t.

While doing so, I noticed that the timestamp which is currently copied over from the sample-info in rmw_fastrtps isn't quite what I expected. that's why the message_info test in rcl is currently failing. I am not currently sure why that is -- maybe the sample-info has a different notion of what the timestamp should contain. or maybe I'm doing the copying wrong. I'll work on that tomorrow.

@rotu please be aware that this is currently enabled for fast-rtps only. if you want to check it on cyclone-dds, please modify the condition in the test/cmakelists.txt for test_message_info.

iluetkeb commented 4 years ago

Okay, here we go for services:

This is the service part of https://github.com/ros2/design/issues/259

Update: Added rmw_cyclonedds. Update2: Added rmw_connext Update3: Added rclpy

And here is a repos file to get all of those at once.

iluetkeb commented 4 years ago

And the message-info changes are in

Edit I've now provided my own PR for rmw_cyclonedds which works with the new timestamp storage. Edit2: and here's the repos file https://gist.github.com/iluetkeb/b3c701725a1cc6cb845a6cbf9fefaae1

iluetkeb commented 4 years ago

Well. The message-info changes should be ready.

The service info changes are mostly complete, but they lack a dedicated test. Such a test could be easily added to test_service and test_client in rcl, in the same way that I added the timestmap test to test_subscription. However, it's 1am here and I think if I continue, I will make more of a mess than help. @wjwwood if you want to pick this up, it would be very much appreciated!

wjwwood commented 4 years ago

I'll do my best.

iluetkeb commented 4 years ago

Alright firstly, @wjwwood thanks for merging the message info changes! Happy we got that in!

@jacobperron the service timestamp changes are ready from my side. hope CI is okay.

Since the PRs come from an organizational fork, maintainers cannot directly edit them, but if there is a need for further modifications, @Karsten1987 has write permissions on all the repos.