ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
536 stars 417 forks source link

Inter-process communication causes increased latency of intra-process communication #1722

Open danielm1405 opened 3 years ago

danielm1405 commented 3 years ago

Bug report

Required Info:

Steps to reproduce issue

  1. Follow the instructions from https://github.com/mitsudome-r/pointcloud_delay_check. It launches a container with two nodes, publisher and subscriber of a pointcloud, that use intra-process communication.
  2. Create inter-process communication, e.g. ros2 topic hz -w 10 /pointcloud

Expected behavior

Intra-process communication latency is not affected by inter-process communication.

Actual behavior

Intra-process communication latency is heavily increased by the existence inter-process communication.

Additional information

I profiled the container:

image

The profiling indicates that the delay is caused by rclcpp::Executor::execute_subscription that is called only when inter-process communication is established. It seems like a bug.

I tried out these modifications which calls add_subscription only for inter-process. Therefore, execute_subscription is not called for this process.

diff --git a/rclcpp/src/rclcpp/node_interfaces/node_topics.cpp b/rclcpp/src/rclcpp/node_interfaces/node_topics.cpp
index c57fbcee..a72d5b5d 100644
--- a/rclcpp/src/rclcpp/node_interfaces/node_topics.cpp
+++ b/rclcpp/src/rclcpp/node_interfaces/node_topics.cpp
@@ -95,17 +95,17 @@ NodeTopics::add_subscription(
     callback_group = node_base_->get_default_callback_group();
   }

-  callback_group->add_subscription(subscription);
-
-  for (auto & subscription_event : subscription->get_event_handlers()) {
-    callback_group->add_waitable(subscription_event);
-  }
-
   auto intra_process_waitable = subscription->get_intra_process_waitable();
   if (nullptr != intra_process_waitable) {
     // Add to the callback group to be notified about intra-process msgs.
     callback_group->add_waitable(intra_process_waitable);
-  }
+  } else {
+    callback_group->add_subscription(subscription);
+
+    for (auto & subscription_event : subscription->get_event_handlers()) {
+      callback_group->add_waitable(subscription_event);
+    }
+  };

   // Notify the executor that a new subscription was created using the parent Node.
   {

Communication seems to work properly as both subscribers receive the data. Below there is a plot comparing original behavior (orange) and the changed (blue) one. The measured value is the delay for intra-process communication. The inter-process communication is switched on and off periodically. The modification reduces the delay significantly.

results

wjwwood commented 3 years ago

That diff doesn't make much sense to me, you're making it so that the subscription and subscription events (things like deadline missed, or publisher matched) are not added if intra-process communication is enabled, is that a correct summary? That doesn't sound like correct behavior to me, because in that case inter-process publishers would never connect to the subscriptions which have intra-process enabled.

wjwwood commented 3 years ago

So, this expectation:

Intra-process communication latency is not affected by inter-process communication.

Isn't going to hold. In the case you've described, when you add the inter-process subscription, the publisher must now serialize and send the message to the wire, which will incur more overhead and probably contribute to latency. Furthermore, when publishing rclcpp must make a copy of the message, one to keep in the intra-process manager to be shared (potentially zero-copy) with subscriptions in the same process, and another to give to the middleware. This copy is the source of some latency, see: https://github.com/ros2/rclcpp/pull/504

fujitatomoya commented 3 years ago

@wjwwood

That doesn't sound like correct behavior to me, because in that case inter-process publishers would never connect to the subscriptions which have intra-process enabled.

I do not think this is the correct fix either.

i think the same timer is being taken all the time with MultiThreadedExecutor, so that even if intra-process event (rclcpp::Waitable) is ready to take, it cannot be taken until timer (in this case publish) is finished. if there is only intra-process, this timer(publish) will finish soon then executor takes the next waitable since timer is already taken.

but as we already know,

when you add the inter-process subscription, the publisher must now serialize and send the message to the wire, which will incur more overhead and probably contribute to latency.

this must happen, so that elastic time window for timer(publish) increases.

i may be wrong...

alsora commented 3 years ago

Besides what already mentioned, there are indeed side effects of having both inter and intra-process enabled at the same time.

adamdbrw commented 3 years ago

This issue is interesting to me. Do you think there is a way that intra-process could not be affected this much by the fact that we have another (non-intra) subscriber? I am speculating without diving deep, but perhaps it is about re-ordering things so that the copying occurs only after the intra-process part is processed already?

danielm1405 commented 3 years ago

I agree that the publisher needs to make an extra copy and that it could cause increased latency.

However, the profiling indicates that rclcpp::Executor::execute_subscription is the root of the problem as it consumes 71% of the total resources in that scenario. It would be more understandable for me if the publication time would increase, not the subscription time.

I am curious why subscription that communicates intra process, receives inter process publication as soon as some other inter process communication appears? It does not seem right to me.

I believe that the changes I introduced removed the duplicated communication what resulted in decreased the latency. I understand that it breaks other functionalities that you mentioned. I wanted at least to pinpoint the issue and to discuss it here (as I weren't able to propose production ready fix).

alsora commented 3 years ago

Disclaimer: I have yet to look into your changes in details

I am curious why subscription that communicates intra process, receives inter process publication as soon as some other inter process communication appears? It does not seem right to me.

The reason is that the RMW layer has no idea that intra-process optimization is happening in the rclcpp layer. So consider the following:

wjwwood commented 3 years ago

Yeah, the "double delivery" problem is likely the reason, but the changes you introduced also breaks interprocess communication for subscriptions outside the process, so it doesn't work.

We have a partial solution in place which is that we have a flag for subscriptions that should have subscriptions ignore data coming from local publishers called ignore_local_publications:

https://github.com/ros2/rmw/blob/dcd8f836e6c2545b39c1540963e175f06625e2f0/rmw/include/rmw/types.h#L157-L170

More work needs to be done there, as it was waiting on us changing the mapping between DDS participant and ROS node vs ROS context. It used to be that there was one DDS participant per ROS node, now there is one per ROS context, which works better for this case, because ROS's intra-process communication happens between nodes in the same context too.

However, we can never assume that this setting works as the underlying middleware may not have a way to support it so we still need to be able to handle the double delivery case, but for middleware that support it then we can avoid the duplicate effort. Ensuring this setting is being set and enforced at the rmw implementation level is the right way to improve this performance issue, I think.

wjwwood commented 3 years ago

DDS based rmw implementations should be able to use the ignore_participant QoS setting, which is part of the DDS standard but isn't uniformly implemented.