ros2 / demos

Apache License 2.0
482 stars 329 forks source link

Intra-process communication latency #289

Open alsora opened 5 years ago

alsora commented 5 years ago

Bug report

Required Info:

Steps to reproduce issue

I slightly modified the demos/intra_process_demo/src/two_node_pipeline/two_node_pipeline.cpp example.

Changing the message type to something "bigger" such as

std_msgs/Header header
byte[4096] array

Moreover I recorded the communication latency as the difference between the timestamp in the message header and the current time when the subscriber callback is triggered.

Expected behavior

Intra-Process communication avoids copying object around and just share a pointer, thus reducing latency.

Actual behavior

When use_intra_process_comms is set to true in the node constructors, the average latency is almost identical or even slightly bigger.

wjwwood commented 5 years ago

I believe this might be a duplicate, or at least related to, this pr (and discussion):

https://github.com/ros2/rclcpp/pull/504#issuecomment-406115061

Basically the issue comes down to this:

In order to avoid a copy between the publisher and subscription callback, a unique_ptr is published which is then passed into the middleware to be delivered (uncopied) to the single subscription callback in the same process.

However, because the ownership of this pointer is not in the control of the publishing function, a copy has to be made first so that this copy may be sent to the middleware (so it can be sent over the network). This copy is likely where the latency you're seeing is coming from.

Unfortunately this must be done right now, for two reasons:

Since that pr's discussion, we now have the ability to see how many subscriptions are matched to our publisher, see:

https://github.com/ros2/rcl/pull/326

But that still doesn't let us see the GUID of the subscriptions, and therefore we cannot tell the difference between a subscription in our process (which will be serviced by intra process) and a subscription in a separate process which will require sending the data over the middleware.

If we could address that, then we could avoid this copy and the latency you're seeing iff you're not using transient local durability (if you're not latching).

wjwwood commented 5 years ago

But that still doesn't let us see the GUID of the subscriptions, and therefore we cannot tell the difference between a subscription in our process (which will be serviced by intra process) and a subscription in a separate process which will require sending the data over the middleware.

I was thinking about this, and if we compare the matched count for the intra process topic and the inter process topic and they were the same size, then I think it would be safe to say all subscriptions are in the same process so it is not necessary to publish to the inter process one as well.