ros2 / design

Design documentation for ROS 2.0 effort
http://design.ros2.org/
Apache License 2.0
218 stars 193 forks source link

Adds design document for Deadline, Liveliness, and Lifespan. #212

Closed nburek closed 4 years ago

nburek commented 5 years ago

This commit adds a new design document that explores how support should be added to ROS for Deadline, Liveliness, and Lifespan QoS policies.

Design doc for task https://github.com/ros2/rclcpp/issues/572

dirk-thomas commented 5 years ago

Please see the developer guide regarding the style of markdown files.

dirk-thomas commented 5 years ago

I think there should be a section considering how this proposal affects RMW implementations which are not DDS based.

dirk-thomas commented 5 years ago

@ros2/rmw_implementations FYI

nburek commented 5 years ago

We had an offline review about this as well with a few people. The following are the notes from that meeting. Responses and updates to the open questions and comments above coming in the next day.

- Main question is how does this impact rmw implementations that don’t support these QoS natively (non-DDS).

- We should not mandate behavior of an rmw when it doesn't support a policy, but we should have recommendations on behavior and a means for them to make this information available to the application.  

- We need to explore an explicit api for checking what capabilities the rmw layer provides as well as implicit behavior and make a decision if we just want to do one or the other or both.  

- Would be useful to review the flushed out API changes a little down the line when they’re ready.

- If we expect Lifespan to be used in a lot of cases then we do want to add native support for it in ROS for the cases when the RMW implementation is not DDS.  

- Not providing full stack support for the deadline feels odd. If we aren’t getting the full impact of the system then we need to be clear about what the deadline is actually a measurement of.

- Look at using multiple QOS structs for services or a unique struct for services. Compare and make a recommendation. 

- Actions and how they handle these new QoS settings can be separated out into its own design doc. 

- Similarly, if we decide an explicit API for checking rmw capabilities would be useful then we can separate that out into its own design review. 
gbiggs commented 5 years ago

I don't agree with treating actions separately for QoS. While it may be like that now, I think the design should be unified. Either that or treat services in the same way.

Having an API for checking RMW capabilities would be useful, and I agree that it's a separate discussion.

spiderkeys commented 5 years ago

A few minor notes:

All three of the policies mentioned apply on a per-instance basis in DDS. In the not too distant future, ROS2 will support IDL 4.2, which will allow users to annotate fields as keys. Even though there are not currently any plans (that I know of) to support a concept of keyed fields in RMW, this may still result in the underlying DDS implementation using the keys to create instances. This could lead to some unintended consequences if the implementation in RMW does not take instances into account.

One example is that in Connext, there is no guaranteed way to be notified of all instances within a datareader's (or writer's) cache that have violated a deadline:

I've spoken with some folks at RTI, and they agree that there should probably be a way to query or be alerted of all outstanding violations. I think it amounted to this aspect of the deadline API not being explicitly specified in the standard. I think you may also end up running into this issue with Liveliness unless you use a ReadCondition to handle the detection, as LivelinessChangedStatus similarly only contains a handle to the last instance handle that triggered it.

All of this to say that there should probably be some footnotes somewhere in the docs if these features are implemented with only single instances in mind! I often wonder if for features like this it might just be better to have a DDS-specific interface if they may not be generally applicable across the entire ROS2 ecosystem. As someone who is working on multiple projects which use native DDS in production and have some hopes of moving completely to ROS2, I would find this very useful and be willing to help out. I think this discussion around RMW "capabilities" could be a good starting point.

nburek commented 5 years ago

I don't agree with treating actions separately for QoS. While it may be like that now, I think the design should be unified. Either that or treat services in the same way.

I believe that there was a design discussion that occurred around this topic during the design of Actions. I'm basing my design of not handling Actions explicitly like we do Topics and Services based on the interface that Actions currently implement that explicitly exposes the QoS policy for the underlying topics and also the lack of direct support for Actions in the rmw layer.

I've spoken with some folks at RTI, and they agree that there should probably be a way to query or be alerted of all outstanding violations. I think it amounted to this aspect of the deadline API not being explicitly specified in the standard. I think you may also end up running into this issue with Liveliness unless you use a ReadCondition to handle the detection, as LivelinessChangedStatus similarly only contains a handle to the last instance handle that triggered it.

I've updated the interface slightly to allow for multiple violations instead of only the latest, but noted that it is dependent on the RMW layers implementation.

Even though there are not currently any plans (that I know of) to support a concept of keyed fields in RMW, this may still result in the underlying DDS implementation using the keys to create instances.

This seems like it would be a bigger problem outside even this design. Even if the RMW layer doesn't expose keyed fields it shouldn't break anything if the underlying DDS implementation needs to start using them. That seems like something that will need to be solved in the migration to IDL 4.2

I often wonder if for features like this it might just be better to have a DDS-specific interface... I can see the usefulness of having that, but I'm not sure if it's something that should be encoruaged. Adding a DDS specific interface would tie any Node using it to a DDS implementation of the RMW. That reduces the portability of ROS nodes that end up relying on it. That was part of the reason why these QoS settings are being explicitly defined in ROS terms instead of just entirely leaning on passing them through to the rmw dds implementations.

nburek commented 5 years ago

Are there any other outstanding concerns with this design? Do I need to schedule another virtual meeting to go over it with interested parties or are we good to merge it in?

gbiggs commented 5 years ago

I didn't know there was a first virtual meeting. I'd certainly like to have another to talk over any latent concerns and clarify points from the comments if necessary.

nburek commented 5 years ago

Let's plan to have another virtual meeting to discuss the remaining concerns. Please let me know if you can't attend and I can find another time. I'll post on discourse as well, but I want to make sure everyone already involved in this review can attend if they want.

Currently I'm thinking Wednesday March 6th at 9:00am PST (UTC-08:00)

You have been invited to an online meeting, powered by Amazon Chime.

  1. Click to join the meeting:

https://chime.aws/8643562590

Meeting ID: 8643 56 2590

  1. You can use your computer’s microphone and speakers, however, a headset is recommended. Or, call in using your phone:

United States Toll-Free: +1 855-552-4463 Meeting PIN: 8643 56 2590

One-click Mobile Dial-in (United States (1)): +1 206-462-5569,,8643562590#

United States (1): +1 206-462-5569 Spain Toll-Free (1): +34 900 813 473 United Kingdom Toll-Free (1): +44 800 085 5175 International: https://chime.aws/dialinnumbers/

gbiggs commented 5 years ago

That time doesn't work for me, unfortunately. Due to family commitments, realistically the latest I can start is midnight and the earliest I can start is 7AM (Tokyo time). If you can find a time outside those, I'll be grateful. If not, then please record the meeting so I can listen to it later.

dejanpan commented 5 years ago

That time doesn't work for me, unfortunately. Due to family commitments, realistically the latest I can start is midnight and the earliest I can start is 7AM (Tokyo time). If you can find a time outside those, I'll be grateful. If not, then please record the meeting so I can listen to it later.

@nburek since all participants on this thread are either from PST or JST time zone, a time between 3PM-9PM JST would work well: https://savvytime.com/converter/jst-to-pst.

nburek commented 5 years ago

@gbiggs @dejanpan @wjwwood

Since it sounds like most interest is from PST and JST time zones then how about the following time instead: Wednesday, March 6th @ 3:00pm PST(UTC-08:00) which is Thursday, March 7th @ 8:00am JST(UTC+09:00)

You have been invited to an online meeting, powered by Amazon Chime.

Click to join the meeting: https://chime.aws/8643562590

Meeting ID: 8643 56 2590

You can use your computer’s microphone and speakers, however, a headset is recommended. Or, call in using your phone: United States Toll-Free: +1 855-552-4463 Meeting PIN: 8643 56 2590

One-click Mobile Dial-in (United States (1)): +1 206-462-5569,,8643562590#

United States (1): +1 206-462-5569 Spain Toll-Free (1): +34 900 813 473 United Kingdom Toll-Free (1): +44 800 085 5175 Japan Toll-Free (1): +81 800-919-0416 Japan: +81 3-4540-5951 International: https://chime.aws/dialinnumbers/

dejanpan commented 5 years ago

@nburek meeting March 6th @ 3:00pm works for @deeplearningrobotics and myself (both from Apex).

gbiggs commented 5 years ago

Works for me.

JaimeMartin commented 5 years ago

@nburek , we will also attend.

rutvih commented 5 years ago

Capturing next steps based on the design review meeting we had over chime.

Our team is working on updating the design document and also preparing PRs (currently linked and in draft mode) for interface and API changes.

lbegani commented 5 years ago

I was wondering if there is a reason to not have more generic design in order to support any type of Qos Policy -

  1. App queries the rmw layer for set of supported Qos.
  2. RMW returns the list of supported Qos (List may represent the content of dds-xml-qos-definitions file)
  3. App parses the list and applies QoS setting appropriate to the use-case.
gbiggs commented 5 years ago

I think that would be a recipe for an explosion in complexity. It would become impossible to test all the possible combinations that may exist. How does an application know what a particular QoS does? What if different rmw implementations use different names for the same concept, or even worse, the same name for subtly different concepts? We need to have exactness and clarity in what each supported QoS means when the application uses it. This goes against allowing "any" QoS setting to be used.

emersonknapp commented 5 years ago

Do you mean "different names for the same concept" in the context of having non-DDS rmw implementations? I would expect we could assume that all DDS implementations will have the same meaning from the DDS spec behind their QoS types

lbegani commented 5 years ago

I think that would be a recipe for an explosion in complexity. It would become impossible to test all the possible combinations that may exist. How does an application know what a particular QoS does? What if different rmw implementations use different names for the same concept, or even worse, the same name for subtly different concepts? We need to have exactness and clarity in what each supported QoS means when the application uses it. This goes against allowing "any" QoS setting to be used.

Agree that this will add complexity. But it will also give app more control over DDS. In the current scheme, a need to support a new QoS will require code changes across the stack. However if we know that ROS2 apps will hardly need any QoS other than those defined in rmw layer, the complexity is not worth the effort.

tfoote commented 5 years ago

Agree that this will add complexity. But it will also give app more control over DDS. In the current scheme, a need to support a new QoS will require code changes across the stack. However if we know that ROS2 apps will hardly need any QoS other than those defined in rmw layer, the complexity is not worth the effort.

Part of the design is to provide the most common QoS settings through the ROS api and if you need full control of the DDS settings you can use go around the ROS abstraction and use the raw DDS API to access and do things with the underlying layer. In particular for QoS settings most DDS implementation support loading the full QoS settings from xml at startup time that gives you much more fine grained control of the DDS settings than the ROS 2 API.

nburek commented 5 years ago

I've made updates to the design document that we discussed at the virtual meeting a couple weeks back. The biggest update was to move Services to future work. I think the last big open question was if we needed to support buffering events of the same type so that the application wouldn't miss an instance of an event. After a little more research we decided not to add a buffer but to coalesce multiple events for the same instance as that appears to be what the DDS api is tailored towards at the moment.

prajakta-gokhale commented 5 years ago

@wjwwood @gbiggs @tfoote

Could you please take a look at this again? Thanks!

nburek commented 5 years ago

I've updated the design doc to match the details of the implementation. It is ready for a final review before being merged.

adam-dunc commented 4 years ago

@tfoote Can we please get this merged? This has been released in Dashing and per @nburek the design has been updated to reflect what was implemented.

wjwwood commented 4 years ago

Sorry for letting it linger, but after looking once more I see a few small things we should tie up before merging.

nburek commented 4 years ago

I've opened an issue to track enhancing the design of liveliness and deadline to support instanced topics with keys. See: https://github.com/ros2/design/issues/252

prajakta-gokhale commented 4 years ago

@wjwwood could you please review the latest changes?

wjwwood commented 4 years ago

Yeah, I'll try to get to them soon. Sorry for the delay.

wjwwood commented 4 years ago

Link to the outstanding thread, since I couldn't find it easily before:

https://github.com/ros2/design/pull/212#discussion_r323900762

nburek commented 4 years ago

lgtm, I think we just have the one point to clear up which I commented on yesterday. Once we decide what to do there I think this will be ready for merge.

@wjwwood Thanks for giving it another look. Sorry for the delay in my response as I've been on vacation for the past few weeks. I've made an update for the particular section in question. Take a look and let me know if you think that addresses the concern.

wjwwood commented 4 years ago

No worries @nburek, thanks for touching it up.