ros2 / design

Design documentation for ROS 2.0 effort
http://design.ros2.org/
Apache License 2.0
215 stars 194 forks source link

Intra-Process Communications for all language clients #251

Open emersonknapp opened 4 years ago

emersonknapp commented 4 years ago

Description

This issue is a call for a design of zero-copy intra-process communications available to all ROS2 language clients.

The current implementation of this feature exists in rclcpp and therefore is not usable for Python (or less-supported languages C, Java, Rust, etc.)

Acceptance Criteria

To close this issue, we want a design document that proposes the architecture for

As a follow up, will attempt to collect existing thoughts from https://github.com/ros2/design/pull/239 and add as comments below.

Note

I do not consider myself personally an expert on this, however I'm very interested in collaborating towards a top down view of what this part of the ROS2 core should look like, and figuring out how the community can pull together towards a solution

ivanpauno commented 4 years ago

I think that we can take some ideas from Connext "zero copy transfer over shared memory":

That's actually interprocess communication over shared memory, but something similar can be replicated using a buffer instead of a piece of shared memory.

The basic idea is that you have to ask to the publisher for a new message, instead of allocating an unique_ptr:

msg = publisher->get_new_message();
if (msg != nullptr) {
  msg->data = "asd";
  publisher->publish(msg);
}

Currently, message lifetime can be extended to be longer than the scope of the callback (in cpp). That would not be possible if we go ahead with something like this (or at least, it will be really hard to implement that feature).


The implementation could live in rcl or rmw, I'm not sure what would be better.

allenh1 commented 4 years ago

@ivanpauno I don't think publisher->get_new_message() ever return nullptr. I'd prefer a more asynchronous way to fetch a message, or potentially blocking on that call instead. I'm not very fond of the blocking call idea, but maybe an asynchronous trigger could be set up?

Maybe it could be set up so that we can std::invoke a callback in the publish() function? This isn't great though, since this would need to be done in rcl, which means it would be wasting cycles checking if there are std::binded callbacks on non-shared memory platforms.

I'm not seeing a way to make this happen in anything above rmw, except of course when there are multiple nodes inside the same process.

Sorry for the rambles, very interested in this idea.

fujitatomoya commented 4 years ago

just sharing my thought,

The implementation could live in rcl or rmw, I'm not sure what would be better.

i believe that it is better to be implemented in rmw, not rcl.

emersonknapp commented 4 years ago

Collecting some relevant parts of the previous discussion here for easier review, and to feed the design:

Re: location of implementation @gbiggs wrote

This is a tangential comment, but I wonder if we could achieve the same zero-copies-when-same-process result by reducing the number of copies requires for going into and out of the rmw layer to zero and using a DDS implementation that also supports zero copies (ignoring that there may not be any and that the standard API may not support this, both of which are solvable issues). One of the reasons for using DDS is to push all the communication issues down into an expert-vendor-supplied library, after all.

Re: location of implementation @raghaprasad wrote

How about moving the intra_process_management into an rmw ? This rmw could handle only intra_process communication and delegate inter-process communication to a any of the chosen DDS rmw implementations.

Support for zero copies is an important objective, but its not the only one. It has been observed that creating DDS participants is pretty resource heavy in terms of net memory required (atleast for FastRTPS & OpenSplice) and the discovery process is CPU intensive (due to multicast). This new rmw could drastically simplify the discovery process and most certainly reduce the memory footprint by needing only one participant per process to support inter_process communication.

Re: smart-ptr messages @gbiggs wrote

But it is possible to do the rmw and rcl APIs and implementations such that they manage their raw pointers properly and provide a smart_ptr interface-compatible object in rclcpp. I'm not saying it would be easy, but this is how the STL is designed to be used and it would be the most powerful solution.

Re: implementation @ivanpauno wrote

I would like to see something mimicking connext Zero Copy Transfer Over Shared Memory semantics (by default connext use shared memory, but it doesn't use zero copy transfer, which have an specific semantics). Basically, instead of creating a unique pointer and then publishing it:

auto msg = std::make_unique<MSG_TYPE>();
/* Fill the message here */
publisher->publish(std::move(msg))

You ask to the publisher a piece of memory, fill it, and then publish:

auto msg = publisher->new_message();
/* Fill the message here */
publisher->publish(std::move(msg)); // I'm using move semantics because the message will be undefined after calling publish. But how we wrap the msg for this is an implementation detail.

For dds vendors that have implemented zero copy transport, this could just wrap it. For others, we could have a default implementation that's used in those cases. That implementation could not use shared memory that allows INTERprocess zero copy transport, but just use a preallocated buffer in each publisher that allows INTRAprocess zero copy transport. This implementation is a good start for later doing something like this (if we want to do it).

I also think this idea will look idiomatic in other languages (for example, in python), and performance should be quite similar.

emersonknapp commented 4 years ago

A question: do we want to have intra-process communication always optimized in ROS2, regardless of choice of RMW?

If yes we want it always available, what about this idea?

Or, this is a possible outcome, should we just expect that intraprocess communications should be the job of the choice of RMW implementation, and just push development to add this to our RMW impl of choice, e.g. FastRTPS or CycloneDDS or wherever?

dirk-thomas commented 4 years ago

How about moving the intra_process_management into an rmw ? This rmw could handle only intra_process communication and delegate inter-process communication to a any of the chosen DDS rmw implementations.

Support for zero copies is an important objective, but its not the only one. It has been observed that creating DDS participants is pretty resource heavy in terms of net memory required (atleast for FastRTPS & OpenSplice) and the discovery process is CPU intensive (due to multicast). This new rmw could drastically simplify the discovery process and most certainly reduce the memory footprint by needing only one participant per process to support inter_process communication.

The overhead described here is addressed by the proposal in #250 and isn't related to intra process communication. Even with intra process communication every node / participant has to perform discovery and comes with that overhead.

ivanpauno commented 4 years ago

@ivanpauno I don't think publisher->get_new_message() ever return nullptr. I'd prefer a more asynchronous way to fetch a message, or potentially blocking on that call instead. I'm not very fond of the blocking call idea, but maybe an asynchronous trigger could be set up?

I guess that it's possible to not return ever nullptr (probably with locking behavior), I just added it because I'm not super sure about how the implementation would be.

i believe that it is better to be implemented in rmw, not rcl.

  • it sounds rmw responsibility to take care of transportation. (rmw)
  • provide consistent/compatible API to frontend, concealed by rmw.
  • taking advantage/comparison of each rmw implementation.

I agree, specially with the first and last points. Each time I think about the intraprocess communication problem, I'm more convinced that it's a problem that should be addressed by the underlying middleware (FastRTPS, Connext, OpenSplice, etc), and we should only wrap their zero copy transfer API. Of course, that's probably out of our scope and we have to provide a solution on top of the middleware. But that have the cost of re-implementing a lot of things (supporting a lot of different QoS features, etc).

Or, this is a possible outcome, should we just expect that intraprocess communications should be the job of the choice of RMW implementation, and just push development to add this to our RMW impl of choice, e.g. FastRTPS or CycloneDDS or wherever?

:+1:

qootec commented 4 years ago

I initially posted this as an topic on answers.ros.org (see https://answers.ros.org/question/333180/ros2-micro-ros-intra-process/) but was advised by the moderator to move it to discourse... I think the core of my concern touches your discussion.

(My context: ROS2 inside a machine controller)

Looking at your proposals for intra-process communication, I fail to see whether you also take into account the multi-priority requirements such (often embedded) environments typically have.

I currently see fragmented solution elements or approaches:

Is there any documented vision on how your intra-process-communication would co-exist with multi-priority queuing/handling?

Johan

gavanderhoorn commented 4 years ago

I initially posted this as an topic on answers.ros.org (see https://answers.ros.org/question/333180/ros2-micro-ros-intra-process/) but was advised by the moderator to move it to discourse...

I did, but this is not the embedded category on ROS Discourse.

atyshka commented 3 years ago

Any updates on this roughly a year later?

ivanpauno commented 3 years ago

Any updates on this roughly a year later?

Not that I know of. The problem isn't trivial, and AFAIK there is no people assigned to work on it.

twaddellberkeley commented 2 years ago

Hi @ivanpauno, is there any work on this problem, if not do you need help? Would love to dive into it.

Cheers

ivanpauno commented 2 years ago

AFAIK, nobody is working on this right now. I'm not sure if there's a plan to work on the topic soon.

emersonknapp commented 2 years ago

I'm not sure, but does the Cyclone+iceoryx combo do this automatically for C++ nodes in the same process?

ivanpauno commented 2 years ago

I'm not sure, but does the Cyclone+iceoryx combo do this automatically for C++ nodes in the same process?

Not zero copy, zero-copy requires a different API.