Partitions, topics, and ROS namespacing

ros2 / design

Design documentation for ROS 2.0 effort

http://design.ros2.org/

Apache License 2.0

224 stars 192 forks source link

Partitions, topics, and ROS namespacing #132

Open spiderkeys opened 7 years ago

spiderkeys commented 7 years ago

I've been reading up on the topic and service mapping article, having seen that the partition PR landed back in March: https://github.com/ros2/design/blob/gh-pages/articles/140_topic_and_service_name_mapping.md

I wanted to feel out how "locked in" this design decision is, as we are building a product on top of RTI DDS with the hopes of running full blown ROS 2.0 once it gets out of beta. To that end, we are trying to make sure that we make decisions around partitioning and topic names that are forward thinking and in line with what you are doing. The concepts settled upon seem very sound to me and I can't find any red flags or deal breakers.

Any expectations of there being major changes to this part of the architecture beyond this point?

wjwwood commented 7 years ago

@Karsten1987 may have more to say on this, but we're already implementing this in https://github.com/ros2/ros2/issues/327 and we've iterated on the design once with the community and once after figuring out that partitions might be a more useful way to handle the namespaces for topics.

There was some discussion about it here:

https://discourse.ros.org/t/ros2-and-dds-messaging/1556/5

That I wanted to reply to but have not due to not having time, but unless something comes up there I don't foresee any serious changes that need to be made.

The only place where we may need to adjust what we're doing is to support publishing and subscribing to non-ROS dds topics, but I remain unsure as to how commonplace this will be and so I'm not sure how much demand there will be to figure this case out. But if people are interested in it, then that may prompt us to make changes to support that. However, as the design article now states, I think that the way partitions are being used by our conventions right now, when paired with an option to skip the ROS specific prefix, will allow many DDS topics to be subscribed to and published to from ROS. If anything needs to change or adapt due to this it will probably be the way we generate and use IDL files.

Other than that case, I think this part of the system is relatively stable design wise.

Karsten1987 commented 7 years ago

As for the partitions, we are in the latest swings to get the namespacing related pull requests reviewed and merged. During the development, we didn't encounter any technical challenges by using partitions on both supported DDS implementations, i.e. fastrtps and rti connext, so that I am confident that partitions was the right decision for namespaces and we stick to that decision.

i am happy to hear you are looking forward to use ROS2. I hope, you have a chance to already test out the existing beta or master version of it. Every critical feedback is helpful.

dirk-thomas commented 7 years ago

@spiderkeys Please be aware of the use case mentioned in the design doc about being able (or actually currently not being able) to communicate with "native" DDS topics (see http://design.ros2.org/articles/topic_and_service_names.html#communicating-with-non-ros-topics). If you plan to switch from RTI DDS to ROS 2 in a single step (all or nothing) that shouldn't be a problem. But if you ever want to mix native DDS participants and ROS 2 nodes there are currently two blockers:

The idl type names generated by ROS use a specific pattern (<pkg_name>::msg::dds_::<msg_name>) which can't be changed. Therefore it is currently not possible to match an existing DDS type (you would need to tweak the idl file to match the naming scheme of ROS).
The referenced design adds ROS specific prefixes to every topic. While the use case of supporting non-prefixed topic names is mentioned that feature is not implemented at the moment (and isn't planned to be implemented in the near future to my knowledge) (you would need to change your DDS topic names to use the ROS specific topic prefixes).

I have tried to raise the concern about createsuch a "ROS island" within the DDS worlds in the past (when the ROS specific prefixes were introduced). Maybe you can provide feedback from a user point of view if e.g. mixing a "native" DDS publisher with a ROS subscriber is a use case you would be interested in?

spiderkeys commented 7 years ago

@wjwwood @Karsten1987 @dirk-thomas

Thanks for the info. Hearing a lot of this makes me wish I had involved myself in the DDS middleware work early on, rather than being passive and waiting for it to get implemented. I believe that integrating with native/plain DDS is going to be important in the continued success/adoption of ROS2 and I'll explain my personal context in this matter.

When I initially saw that ROS2 was going to be using DDS, I was thrilled because, to me, it meant that all of the work I had been doing was going to be able to integrate with ROS2. This was about 2+ years ago when I was working on a distributed robotics framework at NASA Langley for a project called the Autonomy Incubator (now the Langley Autonomy and Robotics Center). We were using RTI Connext for that project as well, and some of the software in our system was built on top of ROS1. ROS1 did a pretty good job of handling singular entities/robots, simulations, etc, but of course did not really scale well or have all of the great reliability/discovery features that DDS brought to the table. We created a manual DDS/ROS bridge library to overcome that issue in our system, which generally worked well but was not automated, thus a pain to use. With ROS2 being built upon DDS, my expectation was that (eventually) we would be able to easily connect future ROS2 applications to our existing system and interfaces.

Fast forwarding to now, I am currently working at a robotics company called OpenROV that is working on an underwater drone called Trident. We are building our software on top of RTI DDS again, as it proved to be the most mature, stable, and high performance vendor implementation. We are doing some pretty cool things with it, especially around low latency video streaming between our vehicles and mobile devices. With ROS2 in the back of my mind as we started developing our software, I kept checking in now and then to see what the state of things was, though I must admit that I haven't tried out the alpha or beta builds enough to have much of an opinion on whether they are production ready (we simply took some of the public verbiage of it not being ready yet at face value). Since we are targeting consumer mobile platforms (Android, iOS, etc), having framework support for those platforms was crucial. RTI provided everything we needed here, including Java bindings for Android, so that was good and we were able to immediately get MVP implementations up and running within a month. FastRTPS seemed to be poised to provide Android and iOS support as well, so we did some prototyping with it, but ran into several issues around video streaming and large messages (I still have an unanswered issue on their repo: https://github.com/eProsima/Fast-RTPS/issues/83), which, when coupled with the pain of getting all of that running using native bindings with the NDK, strengthened our decision to run with RTI.

I have to give a shoutout to @esteve here, who has been doing some amazing work around getting ROS2 bindings running for the Android platform. We considered going this route and building our apps with ROS2 Android, but a few things here left some doubts in our minds: 1.) FastRTPS did not seem to be functioning correctly for our needs, 2.) @esteve had put together support for RTI bindings, but there are still some licensing concerns here, as he had to make modifications to the RTI source to make it work, and 3.) we have toyed with utilizing some RTI only features for achieving high performance video and keeping our system upgrade-able, so we were wary about potentially not having full control over the DDS layer at the offset.

Now, we find ourselves with a working system for a product that we will be shipping in the near future, and were operating on the following assumption which was reached by reading the ROS2 design goals, though now seems to be lacking some substance by having not closely followed the ROS2 implementation and discussions:

Assumption: ROS2 would be able to integrate with existing DDS applications. After all, interoperability of data flows is what DDS is all about.

Concern: With your above explanations of the state of the DDS mapping architecture in ROS2, it seems like it will be fine for us to add ROS2 types and QoS to our applications in order to allow them to connect to ROS2 applications, but it sounds like the reverse will not be true, since your idlgen rules and QoS subsets are going to be rather strict and limited in how they might be able to allow ROS2 apps to interact with native DDS apps.

In my opinion, I think making mixing with existing DDS systems, rather than creating the "ROS-DDS island", is the right way to go for a few reasons:

By making ROS2 interoperable with existing DDS systems, you make the ROS2 ecosystem valuable for existing DDS users and potential future DDS users. My hope has been that RTPS/DDS will catch on due to its great technical merits, but as we all know, the existing DDS community is a completely different beast than open source projects like ROS. I think pressure and demand is going to have to come from sources other than existing government/military/corporate to make that happen. I would assume that since ROS2 has settled on DDS for mostly technical reasons, that you would also like to see this change in the ecosystem, such that it increases ROS2's own value and makes the technology more accessible/usable by people who might develop systems that can interact with ROS2.
You will immediately have a base of users (although it may be small) who already used ROS in the past and moved to DDS, but are still open/amenable to using ROS2 in their ecosystem. I know that this is where OpenROV stands, as we want the ROS2 community to use our underwater robots for all kinds of neat projects (hopefully swarming AUVS!), but for business/risk reasons, we currently can't fully commit to ROS2 in production. I can also with great certainty say that there are several NASA centers that use both DDS and ROS1 right now that would feel the same way. These NASA projects always have interns churning through them that use ROS extensively and will become more and more exposed to the existing DDS systems being used internally. Giving these sorts of users the power to put ROS2 to work in those systems will certainly earn brownie points for ROS2 and I think DDS as a technology.

One more concern I have is how ROS2 plans to handle "evolveability" of systems. RTI has implemented most, if not all of the XTYPES specification, which provides extensible types and dynamic data, and I'm not sure where other vendors are at on this these days or what parts of this ROS2 plans to support. We were planning to use extensibility in our system as a way of not breaking backwards compatibility as we rev applications distributed across several platforms, vehicles, and consumer devices. I haven't been able to find much on how ROS2 is planning on dealing with evolving/extending types to achieve the same thing.

All of this said, I do understand how and why you have reached the architecture and structure that you have. DDS is very complex and you will never gain widespread adoption if you scare off users by requiring them to learn and understand all of it. I do hope though that you will leave enough flexibility inside of the ROS2 island to let your programs play with ours. I don't think the QoS/partitioning strategy is going to be a huge deal, but I do think that the message generation scheme is going to cause some difficulties. Maybe you can have an idlgen process which is separate from the rosmsg -> idl pipeline, and expose some kind of non-ROS Publisher/Subscriber abstraction that can use utilize those standalone types (and QoS). Perhaps a wrapper API with an interface similar to FastRTPS's which lets you use the simple pub/sub abstraction or the underlying base entities? I would be happy to discuss this further and hear your thoughts about the use cases I've described.

wjwwood commented 7 years ago

Hearing a lot of this makes me wish I had involved myself in the DDS middleware work early on, rather than being passive and waiting for it to get implemented.

There wasn't a whole lot of design discussion around the type system as it sort of evolved from whatever we thought was possible at the time. It's not like this stuff is set in stone. We're willing to make changes to improve different use cases still, but it will be work in some cases.

I believe that integrating with native/plain DDS is going to be important in the continued success/adoption of ROS2 and I'll explain my personal context in this matter.

I agree that this is an important use case, and that's why I spent so much time talking about it in the referenced design document, but it will really take someone continuously trying to integrate ROS 2 into a ROS 2 / pure DDS hybrid system to ensure that continues to work. That of course starts with figuring out the existing roadblocks the first time it is tried.

Many users will not care about this feature, but I don't imagine anyone will actively not want this feature unless it prevents some other functionality or it causes the tools and interfaces to be annoying. The latter is sort of the case we're in now, where we have reasons for these extra conventions and if we drop those conventions then we have to find another way to address the original reasons for having the conventions.

it seems like it will be fine for us to add ROS2 types and QoS to our applications in order to allow them to connect to ROS2 applications, but it sounds like the reverse will not be true, since your idlgen rules and QoS subsets are going to be rather strict and limited in how they might be able to allow ROS2 apps to interact with native DDS apps.

That's a fair summary. However, I think the current state will actually cover many use cases, for example:

a vendor wants to use "just DDS" in their product, but makes the topic type and QoS settings ROS 2 friendly
an existing DDS based system is modified (because the integrators control both sets of software) using QoS and/or x-types so it can interoperate with ROS 2 without depending on any ROS 2 code

In both of those cases, other systems are conforming to our conventions, which might not always be possible or desired, but at least it is technically possible.

This might be annoying to the system integrators or the vendors, but at least there is a path forward. It seems to me that it will always be the case that the system with more conventions for message types and topic name patterns (in this case ROS 2) will require the other system to yield to those conventions, or else those conventions need to be circumventable. If the "other" system has it's own rules and conventions on top of DDS, then you might run into issues because it's not clear which is easier to adapt to the other.

I haven't been able to find much on how ROS2 is planning on dealing with evolving/extending types to achieve the same thing.

We haven't decided yet. I don't think "use x-types" is a complete answer (to me x-types is a tool, but it doesn't help you figure out which types are compatible and/or how to handle changes to data structures automatically), and I don't think "make all fields optional" can be done in a performant way for users that care about that. So we have some work to do there yet. I have a plan in mind, but I haven't had the time to sit down and write it all out as a proposal.

@dirk-thomas also might have more to say on this topic.

I do hope though that you will leave enough flexibility inside of the ROS2 island to let your programs play with ours.

That's the intention, but we have to balance "ROS is easy to use" with "exposing all DDS QoS options and settings".

This came up at the last ROSCon as well. I think it's possible, and potentially acceptable to me, that we just end up exposing all the DDS QoS settings more or less unchanged in the ROS 2 API. However, we've already started to find a few places where the DDS pattern isn't exactly to our liking.

In those cases, one option is to just use it anyways to keep our code slim and take on the attitude of "well it works well enough for DDS users". Another is to try and get the DDS spec changed, which is possible, but not a short term solution. And yet another option is to reimplement the concept in question on top of pub/sub or other existing ROS concepts. This has come up for:

the durability QoS setting is not exactly like ROS 1 style latching
using the "participant name" versus communicating the node names in a separate topic
key fields in messages versus just using the ROS 1 topic name hierarchy (namespaces)
services "using content-oriented addressing and being many-to-many" versus "using host-oriented addressing and ensuring a single service server (ROS 1 behavior)"
using x-types versus some other mechanism to handle unknown or evolving message types
... and probably others I'm forgetting now

My guess is that the right strategy is to expose QoS settings as there is demand for them and people who think we're not moving fast enough in that direction can use the DDS API directly in the meantime. We've always planned to have a way to "reach" under our ROS objects to get the underlying, vendor-specific DDS objects so you can do whatever you want to them.

"Reaching under" to the DDS objects is not ideal, since that breaks our vendor abstraction and makes code less generic, but I feel that this will only occur in rare cases, on the edges of systems. If we find in practice that many people are reaching under to DDS to perform a particular task then we should look at why they need to do that and address that issue in the ROS 2 API. Again, an approach of "on demand" seems right to me here, but I could be wrong.

I know all of this is just me trying to justify that a ROS 2 island in a DDS ocean is not a bad thing, but I kind of feel that way. That being said, I want to make it as flexible as possible and as seamless as possible.

I don't think the QoS/partitioning strategy is going to be a huge deal, but I do think that the message generation scheme is going to cause some difficulties.

I agree that our translation from .msg to .idl is an issue, since we rely so much on having .msg definitions throughout our system. This part will definitely require us to study the use case and extend our system to support it. It's not currently a matter of changing some settings to make it work, as will be the case for the topic name (in most cases).

Perhaps a wrapper API with an interface similar to FastRTPS's which lets you use the simple pub/sub abstraction or the underlying base entities?

That's basically what we have in rmw, but we're trying to keep that as simple as possible. We could have, yet another API under rmw that serves as the DDS abstraction on top of different DDS vendors. That would let us keep rmw simple but still give a portable interface with most if not all of the DDS feature exposed. I'm not keen on this approach, but it is possible. I think I would still like to see people use Fast-RTPS or Connext directly and let those cases where they need to do that bubble up to the ROS API overtime.

@dirk-thomas said:

and isn't planned to be implemented in the near future to my knowledge

Actually I plan to add that in the pr I'm working on now. I was waiting for @Karsten1987's pr (https://github.com/ros2/ros2/issues/327) to get it working before I looked at how to expose the option and disable it in special cases.

gbiggs commented 7 years ago

@spiderkeys said:

After all, interoperability of data flows is what DDS is all about.

@wjwwood said:

In both of those cases, other systems are conforming to our conventions, which might not always be possible or desired, but at least it is technically possible.

This might be annoying to the system integrators or the vendors, but at least there is a path forward. It seems to me that it will always be the case that the system with more conventions for message types and topic name patterns (in this case ROS 2) will require the other system to yield to those conventions, or else those conventions need to be circumventable. If the "other" system has it's own rules and conventions on top of DDS, then you might run into issues because it's not clear which is easier to adapt to the other.

I think that I agree with @wjwwood's sentiments here. DDS is about data flows, but it provides no more support for interoperability than the concept of classes and APIs does in C++. It's up to the users of DDS to make their data types and topic names compatible if they want to be interoperable. ROS2 applies conventions on top of DDS to make it significantly more likely that DDS data flows using ROS2 are going to be interoperable. As I would need to do if I turned up on @spiderkeys's doorstep and asked to interoperate my DDS system with theirs, someone asking to interoperate another DDS system with ROS2 will need to, in some way, comply with the conventions of ROS2.

How compliance is achieved is the question here, not whether or not ROS2 should be an island in the DDS ocean: Every system is an island in the DDS ocean based on its own topic and data type conventions. You could argue that ROS2 is a more isolated island due to abstracting many parts of DDS (hiding IDL, an abstracted API for QoS settings, etc.), but the result is (or should be) the same: a set of conventions that must be complied with to interoperate with DDS systems that happen to be ROS2-based.

Whether compliance should be achieved by meeting in the middle, or whether the pure DDS user should need to go all the way to ROS2's conventions, is something that should be answered based on the impact on the complexity of using ROS2.

spiderkeys commented 7 years ago

but it will really take someone continuously trying to integrate ROS 2 into a ROS 2 / pure DDS hybrid system to ensure that continues to work. That of course starts with figuring out the existing roadblocks the first time it is tried.

Agreed. I will begin putting some effort into the integration of the two to help suss these roadblocks out.

We haven't decided yet. I don't think "use x-types" is a complete answer (to me x-types is a tool, but it doesn't help you figure out which types are compatible and/or how to handle changes to data structures automatically), and I don't think "make all fields optional" can be done in a performant way for users that care about that. So we have some work to do there yet. I have a plan in mind, but I haven't had the time to sit down and write it all out as a proposal.

Yes, I agree that it isn't a complete solution. Right now, our current plan to is try and adopt a policy of strictly "add or extend, but continue supporting all" when it comes to evolving message types, since we need to be able to support a number of client applications on different platforms, potentially at different versions than what is running on our vehicles. It isn't perfect, though, and I can easily see where cruft will build up, and we may sometimes be forced into situations where we end up with multiple readers/writers for divergent types. That, of course, ends up being sub-optimal from performance and complexity viewpoint, but the only ways I can see to avoid it right now are to leverage extended types or dynamic data. Dynamic data has the worst performance, of course, and with extensible types you run into the issue you mentioned where you have to start dealing with optional fields. So far, optional fields (and leveraging the fact that floating point types have NaN values) have seemed like the lesser of the evils. I look forward to learning about your thoughts on navigating these challenges.

This came up at the last ROSCon as well. I think it's possible, and potentially acceptable to me, that we just end up exposing all the DDS QoS settings more or less unchanged in the ROS 2 API

I would advocate for this approach. I'm curious to hear which of the DDS policies wouldn't be desirable within ROS2, or at least don't seem to be worth the effort of adding them to the RMW layer. There are very few that I have not found use cases for within the context of robotics.

However, we've already started to find a few places where the DDS pattern isn't exactly to our liking. And yet another option is to reimplement the concept in question on top of pub/sub or other existing ROS concepts.

In my experience with DDS, the practice of implementing any desired concept/feature on top of pub/sub and the DDS design patterns is generally feasible and has been the path I've taken on a number of occasions. In a number of instances, RTI has been fairly helpful in coming up with clever ways to implement certain concepts using DDS constructs. I would be surprised if most of these problems couldn't be solved in some way on top of the features DDS provides (some of these solutions are coupled to using QoS effectively, another reason for why I would make sure all standard QoS policies are supported).

My guess is that the right strategy is to expose QoS settings as there is demand for them and people who think we're not moving fast enough in that direction can use the DDS API directly in the meantime. We've always planned to have a way to "reach" under our ROS objects to get the underlying, vendor-specific DDS objects so you can do whatever you want to them.

That's basically what we have in rmw, but we're trying to keep that as simple as possible. We could have, yet another API under rmw that serves as the DDS abstraction on top of different DDS vendors. That would let us keep rmw simple but still give a portable interface with most if not all of the DDS feature exposed. I'm not keen on this approach, but it is possible. I think I would still like to see people use Fast-RTPS or Connext directly and let those cases where they need to do that bubble up to the ROS API overtime.

Yes, this would probably be my approach at the moment, if I was developing ROS2 applications that interfaced with existing systems. As long as the implementation provides a PSM-compliant DDS API, then you can avoid vendor-specific portability issues. One thing I worry about here, though, is that while FastRTPS provides wire interoperability with DDS, it does not implement the DDS C++ PSM. Because of this, there isn't a portable way to programmatically work at the DDS layer. If RMW exposed a more complete subset of DDS and the standard APIs, then this wouldn't be an issue and there would be more flexibility on the ROS2 application side to interface with the outside world. (Edit: Although I have to be honest here, only RTI and Prismtech have gone through the effort of being compliant with DDS-PSM-CXX, as far as I can tell)

Referencing, https://github.com/ros2/rmw/issues/51, @jacquelinekay writes:

After some discussion, we decided not to expose the entire QoS space in ROS 2 for now, to keep the API simple and to keep the possibility of supporting a middleware besides DDS

This was back in 2015, so things may have changed, but is that last bit about keeping ROS2 open to other middlewares still a valid driver for keeping the QoS/DDS implementation small inside of RMW? If not, and ROS2 is fully committed to DDS now, perhaps it makes sense to continue fleshing out full support/abstraction.

@gbiggs, I agree with your statements, but my main point in highlighting DDS's goal of data flow interoperability is that it is built upon the premise that if both sides know the types and know the contract (QoS), then they can connect without knowing anything else about each other. For a DDS-only developer with the flexibility of adding support for new types/qos to their system, all is well, because ROS2 has made the information available and implements a relatively simple subset of QoS. As any DDS implementation will provide that developer access to all of the APIs and tools they require to make the connections to ROS2 applications, they are able to do so. On the other side of the fence, ROS2 only provides a portable way to talk to other ROS2 applications, and there currently is not a good way of talking to the existing applications as the core functionality of talking about non-rosmsg-generated types, topics, or partitions (though there are designs to cover the last two) is not implemented/exposed, and some QoS policies are not implemented, based on my current understanding.

Whether compliance should be achieved by meeting in the middle, or whether the pure DDS user should need to go all the way to ROS2's conventions, is something that should be answered based on the impact on the complexity of using ROS2.

Of course, this is the crux of the issue, as all of the conversations above demonstrate. There is nothing inherently stopping ROS2 from being able to use non-ros types, topics, and QoS, it's simply a matter of managing ROS2 complexity and putting in the time to do work in RMW to support more common interfaces. It is completely understandable that until there is a strong enough demand or enough support provided in doing so, that the burden fall on the DDS user to play by ROS2 rules.

That said, I would like to do what I can to start helping provide support for bridging that gap, if there is a clear path forward for what kinds of decisions need to be made and if there is an idea for what form this "pure DDS API" should take, if that is even the best route to go. Clearly, one step here is to start cooking up some ROS2 programs that attempt to integrate with our existing applications, to start identifying pain points in the process, which I will begin doing over the coming months.

gbiggs commented 7 years ago

@gbiggs, I agree with your statements, but my main point in highlighting DDS's goal of data flow interoperability is that it is built upon the premise that if both sides know the types and know the contract (QoS), then they can connect without knowing anything else about each other. For a DDS-only developer with the flexibility of adding support for new types/qos to their system, all is well, because ROS2 has made the information available and implements a relatively simple subset of QoS. As any DDS implementation will provide that developer access to all of the APIs and tools they require to make the connections to ROS2 applications, they are able to do so. On the other side of the fence, ROS2 only provides a portable way to talk to other ROS2 applications, and there currently is not a good way of talking to the existing applications as the core functionality of talking about non-rosmsg-generated types, topics, or partitions (though there are designs to cover the last two) is not implemented/exposed, and some QoS policies are not implemented, based on my current understanding.

OK, I can see what you're saying now, and I agree with it.

That said, I would like to do what I can to start helping provide support for bridging that gap, if there is a clear path forward for what kinds of decisions need to be made and if there is an idea for what form this "pure DDS API" should take, if that is even the best route to go. Clearly, one step here is to start cooking up some ROS2 programs that attempt to integrate with our existing applications, to start identifying pain points in the process, which I will begin doing over the coming months.

This is something I would like to help out with, but I don't have a decent complex DDS system available here to drive the work for me.

wjwwood commented 7 years ago

Agreed. I will begin putting some effort into the integration of the two to help suss these roadblocks out.

We discussed this again in our weekly meeting. If I can speak for the group (@ros2/team), we all continue to believe that this is a useful use case (interfacing a pure DDS system with ROS 2) and to that end that we should make sure it remains as easy as possible and as performant as possible to do this kind of integration. That means testing it and adding features or adjusting the implementation to make it easier over time.

So what we're going to propose during our next planning meeting is to add a demo to beta 3 (roughly July-September) that demonstrates this kind of integration. It doesn't have to be a perfect solution, but by having a demo that gets compiled and tested with our other code we can know when a change is going to break that and it can serve as a place to try and test out improvements to make the use case work better.

We haven't laid out the specific goals for this demo yet, so if anyone has any ideas for what would be compelling please offer them here.

Personally, what I imagined for at least one part of the demo would be to write a camera driver (maybe like the OpenCV based ones we already have) using only Fast-RTPS or Connext (or both) and create a custom .idl file which is similar to sensor_msgs/Image but slightly different. This can be the "prompt" and the rest of this part of the demo would be how to integrate this with ROS 2 and combine it with image processing algorithms already existing in ROS 2 and how to use it with the ROS 2 tools. That's only half of the problem, so we'll have to come up with a compelling demo for sending data out of a ROS 2 system into a pure DDS system.

I would advocate for this approach. I'm curious to hear which of the DDS policies wouldn't be desirable within ROS2, or at least don't seem to be worth the effort of adding them to the RMW layer. There are very few that I have not found use cases for within the context of robotics.

It's not so much that they are undesirable, but that they might not be useful very frequently or even if they would be useful there's a reasonable workaround that doesn't require the special feature. I think ROS 1 got a long way with just reliable, queue size, and single message latching. Obviously I think we should have more features than that, but some of the QoS settings in DDS appear to be very niche.

These are the QoS we currently expose more or less directly:

Reliability
History
Durability
- only TRANSIENT_LOCAL, VOLATILE

We've noticed some inconsistencies with how Durability is matched, which makes it not work like ROS 1 and in our opinions makes it less useful. So this is where we need to decide whether to just expose the DDS feature and let others get used to it or if we should build up our own concept, using DDS's feature to make it work well.

On my list of "probably should be exposed in ROS 2 API, but it hasn't been requested or we haven't had time":

Deadline
Latency
Ownership
TimeBasedFilter
Transport Priority
LifeSpan

On my list of "could be useful but not clear there is need/demand for it (yet)" QoS settings:

Durability:PERSISTENT and DurabilityService
Presentation
DestinationOrder
- Rather see implemented in "user-space" (maybe "ROS-space"?) for more control on which timestamp is used (sent vs received vs timestamp in message)
Entity Factory
Writer Data lifecycle
Reader Data lifecycle

There are yet more things we are controlling directly so that we can implement certain features in ROS 2 (may be exposed in part or through a different pattern but not directly):

UserData, TopicData, GroupData
- these are useful for us implementing certain features, but I don't see a general need for them
- I'd rather see a well defined ROS concept which uses these to be implemented, but I'd rather not see user code utilize these or something like them directly
Liveliness
Partition
Resource Limits

I think the "add/expose them as needed to the ROS 2 API" approach is best here, because each one we expose, increases the coupling to DDS and increases the surface area of our interface which we have to document, teach, and test.

This was back in 2015, so things may have changed, but is that last bit about keeping ROS2 open to other middlewares still a valid driver for keeping the QoS/DDS implementation small inside of RMW?

A goal since the beginning has been to keep DDS specific symbols (C/C++) out of the ROS 2 API. The idea was to prevent us from getting tied to symbols of a specific vendor (e.g. Connext vs OpenSplice) but also to prevent us from being forever coupled to DDS.

We are, however, still committed to "ROS 2" requiring that DDSI-RTPS is used on the wire and SPDP / SEDP being used for discovery over multicast-UDP. At least that's still the plan right now.

If others want to put something else under the hood then it would be something different, like "ROS 2 with ZMQ and Friends" or "ROS 2 with OPC-UA" and those would not be "compatible" with just plain "ROS 2", but may have value on their own. We already have some other groups interested in replacing the DDS subsystem with other systems, whether it be a domain specific technology like OPC-UA (see this) / SOME/IP / AutoSAR / etc... or a "local" rmw which doesn't use the network at all.

Part of the point of the rmw layer is to protect the "value-added" of the code that comes on top from future changes in the middleware. Perhaps that's a pipedream, but I think we can insulate the "above code" from quite a lot if we just try to keep rmw as simple and well defined as possible.

I'm not saying I personally want to replace DDS with something else anytime soon, but I think adding that layer of insulation, conceptually as well as technically, is a good thing to do. Because in principle, something like /tf or rviz doesn't really care about the "how" so long as a set of behaviors can be ensured. It all depends on how comprehensive that set of behaviors needs to be (essentially how many QoS settings are supported).

If not, and ROS2 is fully committed to DDS now, perhaps it makes sense to continue fleshing out full support/abstraction.

I don't see it as our mission to provide this kind of "actually portable" DDS api. But if that's a necessary byproduct of this work, then that's fine.

spiderkeys commented 7 years ago

On my list of "could be useful but not clear there is need/demand for it (yet)" QoS settings:

I agree with this assessment, and would posit that these QoS are much less likely to be used in either ROS2 or general DDS applications. Of the six, I believe I have only used Presentation (for an experiment involving efficient H264 transmission, since abandoned), and Entity Factory (long ago enough to where I don't remember why). As far as durability/durability service goes, I think that volatile and transient-local should satisfy most usecases. Otherwise, ROS2 would have to implement its own "Persistence Service", which sounds like much more effort than its worth at this stage, lacking any clear demand.

I'm not saying I personally want to replace DDS with something else anytime soon, but I think adding that layer of insulation, conceptually as well as technically, is a good thing to do.

Agreed. I'm getting a better idea for the design goals now, and would say that your approach of ensuring the availability/implementation of a set of communication concepts, insulated from the underlying middleware, is a mindset that is in the best interest of the ROS2 in the long run, even if DDS is technically adequate now.

I don't see it as our mission to provide this kind of "actually portable" DDS api. But if that's a necessary byproduct of this work, then that's fine.

True, I wouldn't expect it to be your goal either, though I have a feeling that it will naturally happen as a result of trying to achieve the integration goals touched on here.

We haven't laid out the specific goals for this demo yet, so if anyone has any ideas for what would be compelling please offer them here. Personally, what I imagined for at least one part of the demo would be to write a camera driver (maybe like the OpenCV based ones we already have) using only Fast-RTPS or Connext (or both) and create a custom .idl file which is similar to sensor_msgs/Image but slightly different.

How about a camera driver that encapsulates any generic, UVC-compatible camera? For our system, I've developed a "camera server" that provides some services around the cameras we use in our system, including:

Automatic detection of cameras and camera details/capabilities via libudev and libv4l2 (perhaps this could be done more portably via libuvc)
Creation of live video streaming endpoints for each camera (could support multiple video formats: MJPEG, H264, raw formats, etc)
Leverages the reliable writer protocol to help ensure successful transmission of frames while minimizing latency. Using a history/send buffer of a few frames especially helps to minimize the impact of missed frames in an H264 stream delivered over a relatively unreliable network, such as Wifi.
For H264 streams, can store latest IDR, SPS, and PPS NALUs so that late joiners can immediately begin playback without waiting for the GOP period to elapse, leveraging history and transient local durability.
Camera control/settings interface (allow clients to request changes to settings like framerate, resolution, brightness, gain, etc..)
Stats publishing (average framerate, dropped frames, jitter, color statistics, etc)

I could help put this together using either FastRTPS or Connext, though the last time I tried using FastRTPS, I was unable to successfully create an asynchronous writer putting out data that was regularly larger than the 64k UDP max framesize. Maybe this is fixed now, though, or if it isn't, maybe this demo could help to push on any remaining roadblocks in that area.