Embedded ROS 2 Design Page

iluetkeb commented 5 years ago

Start of the design page for the ROS 2 Embedded effort.

Includes contributions by all of the OFERA consortium partners.

iluetkeb commented 5 years ago

@clalancette I see that the action's PR is on a branch in this repo. Maybe more people can add commits that way? If yes, maybe we could re-target this to a branch, merge it there without discussion, and then do a PR from that branch to gh2-pages and re-commence the discussion on that? Just let me know.

iluetkeb commented 5 years ago

@clalancette I saw that it is easily possible to re-target the PR to a different branch. If you guys would be willing to give me access to the design repo, I'd very much prefer creating an "embedded_ROS2" branch there and continue the discussion in that.

clalancette commented 5 years ago

@clalancette I saw that it is easily possible to re-target the PR to a different branch. If you guys would be willing to give me access to the design repo, I'd very much prefer creating an "embedded_ROS2" branch there and continue the discussion in that.

I'll check what our policy with that is. For now, I'll just pull this PR and push directly to a branch on this repo.

clalancette commented 5 years ago

All right, closing this in favor of https://github.com/ros2/design/pull/198

gbiggs commented 5 years ago

It looks like the discussion is going to move back here, so I'll leave my comments on this pull request.

First, some general comments. I read both this document and the linked OFERA report.

The biggest problem I have is that there is an inherent assumption right from the start that something else needs to be created to support small-scale embedded devices, i.e. rmw/rcl cannot be used directly even with modifications. I haven't seen any evidence to support this assumption and without seeing some I can't agree with the direction this document proposes for how ROS2 will support small-scale devices. I would prefer to see rmw/rcl used directly and modified where necessary, unless it can be shown that making them work at a small scale will compromise them for medium and large scales.

Not using rcl directly does two things that I don't like:

It creates a separate implementation of core ROS2 functionality, the urcl library, which undermines the concept that rcl provides the core functionality and all other client libraries just wrap rcl, meaning every client library gets exactly the same behaviour and receive any changes and fixes together.
It requires a bridge to talk between "normal" ROS2 and embedded ROS2 nodes.

For the linked report's statements and requirements, I have some additional comments relevant to the above:

The statement that micro-ROS will split functionality into separate libraries to enable developers saving resources "by picking only those features they really need" is something that would be useful at any scale, and I see no reason rmw/rcl cannot enable this in some way (changes in library structure, or using compile-time flags to enable/disable features, for example). rcl already has separate libraries for lifecyle nodes and actions and while I don't like the current layout of the APIs in regards to this separation, structurally this is clearly possible.
The micro-ROS APIs for things like life cycles, predictable scheduling and system modes are apparently going to be much richer than those in rcl due to "advanced concepts specific to micro-ROS". What are these advanced concepts and why are they specific to tiny embedded devices? For example, I think that rich control over predictable scheduling is something we will be very interested in for a real time system at any scale. I'm sure Dejan and his people at Apex.AI would agree with me and they are working at the level of autonomous cars.
There are many statements saying things like "create a generic framework in the spirit of ROS" and "be ROS2 compatible". These imply that the OFERA goal is to create something compatible with ROS2, which makes me think "then why is this design document going on ROS2's design site?" It's probably just wording for a project report, but I still think that it sounds weird.
All the things listed in the table in 4.2, such as allowing static node and topic layouts and support for sleep states are things that are desirable in rcl as well.
All of the performance requirements listed in section 5.2 are desirable at any scale. Even in a large system, I still want things like rapid start-up times, no-copy communication within the same MCU, and minimal power usage. It therefore makes more sense to me that rmw/rcl be improved to satisfy these requirements rather than putting that effort into a separate implementation. Some of the requirements are already satisfied by rcl.

So to sum up, if someone can show a compelling argument why rmw/rcl cannot satisfy the very small scale devices use case that this document targets, then I can agree with the need for a separate implementation. But so far I have only seen this as an unsupported assumption, and the listed requirements seem reasonable for rmw/rcl. Similarly, the wishlist are all things that I would want at larger scales as well.

iluetkeb commented 5 years ago

@gbiggs I think we got off a bit on the wrong foot here, most likely because you recall an earlier discussion on discourse whether rcl could be re-used or whether there's going to be a ucrlc. Because, really, it has not been my intention to convey that there is a decision to do a ucrlc. If there's text in this document that says otherwise please point it out (and I've taken note on the sentence with "specialized" above).

That said, there is observable evidence from existing embedded implementations for small and tiny devices that should give us pause on whether barging ahead and porting rmw/rcl at all costs is the right approach. That's why this document is using fuzzy language that postpones such a decision and instead commits to measuring it.

FWIW, right now, as part of the OFERA project, eprosima has provided an rmw implementation for Micro XRCE-DDS. They also ported rcl, and there is ongoing work to port rclcpp. We have identified several issues with that (e.g., related to 64bit atomics) and are talking to the OSRF about it. @BorjaOuterelo can probably provide more info.

In addition to that, several people, including Robotis, Amazon, and myself, have also pursued alternatives to that, for various reasons. You may not like it, but I would argue that you actually should, because these alternatives provide concrete examples of how to do things differently, which IMHO is a much better foundation for discussion than "what if's". Update: and this also allows us to show something desirable without having to figure out to reconcile it with sometimes conflicting current design decisions in rmw/rcl. Because, for example, things like static allocation are not only about allocating all the memory at the beginning, but also about being able to make the assumption that structures are going to be valid without having to check it again over and over. And that's not going to fit in very well with the way rcl does things currently.

This is ongoing work, and benchmarking results will trickle in, so that we get more evidence for decision-making.

btw, let me also note that this is not only about resource use, but also about how to get rid of the plethora of checks in rcl and rmw on whether a pointer is still valid ,-) I very much hope that at least some of that will get into rcl proper :-)

gbiggs commented 5 years ago

it has not been my intention to convey that there is a decision to do a ucrlc. If there's text in this document that says otherwise please point it out (and I've taken note on the sentence with "specialized" above).

This document is more ambiguous about the intended approach. The linked OFERA report is not. It seems quire clear in that report that the intention is to create a separate software stack that is ROS2-compatible to meet the requirements. There are statements like:

"The next layer is the micro-ROS client library (urcl) analgously to the ROS client library (rcl) in ROS 2."
"Since the micro-ROS aims to create a generic framework for robotics in the spirit of ROS"

and even a requirement:

"Micro-ROS bridge: functionalities that are specific to the bridge between ROS2 and micro-ROS."

I don't know how I can read these any other way than that OFERA has decided from the outset that a separate software stack will be created. It's making this decision with any evidence for why it is necessary that bugs me. Of course I have no input into OFERA so if that project wants to do that, then it can go right ahead. However, when it comes for how ROS2 is going to define its tiny-scale embedded support, then I think more evidence needs to be provided to support such a decision. The problem comes from using that report as support for this design document, and this document also being written in a way that implies creating a separate software stack is the goal due to statements like the one near the very start that talks about creating a ROS2-interoperable stack. Perhaps such statements could be rephrased to include reusing the existing client libraries as much as possible, and making modifications where possible, and using as little custom code as possible?

several people, including Robotis, Amazon, and myself, have also pursued alternatives to that, for various reasons. You may not like it, but I would argue that you actually should, because these alternatives provide concrete examples of how to do things differently, which IMHO is a much better foundation for discussion than "what if's".

I don't think I said I don't like that work being done. If it seems like I did, then that was not my intention and I apologise for the confusion. I want to see these approaches tried and I want to see using rmw/rcl as much as possible tried, so we have numbers that can be used to make a decision. I even said as much in one of my previous comments.

iluetkeb commented 5 years ago

Update: I had some more text here earlier, but it's really besides the point.

In the OFERA project, we're hedging our bets with regard to how much we can re-use, because there's some technical and organizational uncertainty. Please do not overinterpret this, and instead lets move on to the technical discussion. If we didn't want to play by the communities terms, we wouldn't be here.

"Micro-ROS bridge: functionalities that are specific to the bridge between ROS2 and micro-ROS."

That's mainly the XRCE-DDS agent. Because we're not using the same middleware, we have to bridge. People have tried to bring DDS to small devices, and it didn't work. That's not just a matter of memory, it's also about enabling power-saving etc. (which doesn't go well with a protocol that assumes you're always listening).

Things which might distinguish this bridge from the "plain" XRCE DDS agent include ROS-specific things, such as TF filtering, etc.

gbiggs commented 5 years ago

I think that the changes made in https://github.com/ros2/design/pull/197/commits/2b7cb6c08afad7a024e2710f97c083faf89d2e0d address my concerns about the planned direction for the work.

smorita-esol commented 5 years ago

Why don't you discuss the rmw (or rcl) plugging multiple middlewares?

I think the embedded ROS aims to support non-DDS or non-XRCE DDS middlewares. And there are some cases that multiple middlewares exist in one system (e.g. standard DDS and MQTT). In those cases, at least one node straddling over plural middlewares is needed to subscribe the one side's topic (standard DDS) and process it, publish the processed topic to the other side (MQTT), or vice versa.

This function is apparently needed in the embedded system where suitable middlewares and/or wire protocols tend to vary for each purpose. But, it is also beneficial for non-embedded systems. For example, SONY stated that they are considering if their middleware is suitable for ros2 or not (https://roscon.ros.org/2018/presentations/ROSCon2018_Aibo.pdf, p29). ROS2 developer like SONY will probably implement the function above to harmonize their middleware with the standard DDS.

iluetkeb commented 5 years ago

Why don't you discuss the rmw (or rcl) plugging multiple middlewares?

This would be an RCL topic in general, wouldn't it?

In my experience, for MCUs, this is decided at deployment time. Do you have use cases where it's decided at runtime?

In those cases, at least one node straddling over plural middlewares is needed to subscribe the one side's topic (standard DDS) and process it, publish the processed topic to the other side (MQTT), or vice versa.

In our systems, we have an agent that does such things. It is running on the Linux side. This is not something we burden then MCU with.

Maybe I should add a system sketch early on in the document, to show the overall system architecture and eco-system.

smorita-esol commented 5 years ago

Thanks for your comments.

This would be an RCL topic in general, wouldn't it?

Currently, I don't have any idea about which layer (rmw or rcl) is better to add the function.

In my experience, for MCUs, this is decided at deployment time. Do you have use cases where it's decided at runtime?

To avoid misunderstanding, I'd like to mention that those plural middlewares have to be used concurrently (not exclusively) in runtime. Of course, the set of middlewares to be used in runtime has to be decided until deployment time.

In our systems, we have an agent that does such things. It is running on the Linux side. This is not something we burden then MCU with.

It may be true if we choose the set of standard DDS and XRCE DDS. But, there are some middlewares which have no functions of the agent publishing/subscrinbing standard DDS topic. If we accept those middlewares, we should provide the function to bridge over different middlewares without agent. Once we provide the function, we can also choose the system configuration where the system has no DDS middlewares(e.g. combination of ZeroMQ and EtherCAT).

iluetkeb commented 5 years ago

@clalancette What's your view on this? Can we merge?

AFAICT, we resolved all major issues with this version of the document. We also had a SIG telco a while ago where we agreed on the further direction (i.e., sticking with stock rcl/rmw).

Of course, as design documents go, it definitely needs further work and we're going to further develop it as we go along. However, since there have been no more issues identified, I think this version could go in as a baseline for now.

iluetkeb commented 5 years ago

@gbiggs would you agree that we have addressed your points? You made one comment in this direction, but I'm not sure whether the PR as a whole has been marked valid.

gbiggs commented 5 years ago

@gbiggs would you agree that we have addressed your points? You made one comment in this direction, but I'm not sure whether the PR as a whole has been marked valid.

Yeah, I guess so. From your recent updates it sounds like you are going in the direction I prefer.

ros2 / design

Embedded ROS 2 Design Page #197