micro-ROS / micro_ros_arduino

micro-ROS library for Arduino
Apache License 2.0
435 stars 112 forks source link

Publishing with `rcl_publish()` takes 10ms #20

Closed anaelle-sw closed 3 years ago

anaelle-sw commented 3 years ago

Hello! Our use case requires us to control the execution frequency of an application running on a Teensy board. But it seems that publishing on topics takes way more time than I was expecting it to.

Setup

Steps to reproduce

In my own code, my node publishes a std_msgs__msg__Range message. I measured:

(This code is quite long and messy for now, so I won't copy it here. But if you would like to have a look on it, I will make a "lighter/clearer" version of it, in order to post it here. Just let me know.)

I was able to reproduce this issue with your publisher example, which is a std_msgs__msg__Int32 publisher type:

#include <micro_ros_arduino.h>

#include <stdio.h>
#include <rcl/rcl.h>
#include <rcl/error_handling.h>
#include <rclc/rclc.h>
#include <rclc/executor.h>

#include <std_msgs/msg/int32.h>
#include <std_msgs/msg/float32.h>

rcl_publisher_t publisher;
std_msgs__msg__Int32 msg;
rclc_executor_t executor;
rclc_support_t support;
rcl_allocator_t allocator;
rcl_node_t node;
rcl_timer_t timer;

// Publication duration measurement variables:
rcl_publisher_t pub_latency;
unsigned long time_before_pub=0;
unsigned long time_after_pub=0;
std_msgs__msg__Float32 pub_duration;

void timer_callback(rcl_timer_t * timer, int64_t last_call_time)
{  
  RCLC_UNUSED(last_call_time);
  if (timer != NULL) {

    time_before_pub = micros();

    rcl_publish(&publisher, &msg, NULL);

    // Publication duration measurement:
    time_after_pub = micros();
    pub_duration.data = (time_after_pub - time_before_pub);
    rcl_publish(&pub_latency, &pub_duration, NULL);

    msg.data++;
  }
}

void setup() {

  delay(2000);

  allocator = rcl_get_default_allocator();

  //create init_options
  rclc_support_init(&support, 0, NULL, &allocator);

  // create node
  node = rcl_get_zero_initialized_node();
  rclc_node_init_default(&node, "micro_ros_arduino_node", "", &support);

  // create publisher
  rclc_publisher_init_default(
    &publisher,
    &node,
    ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, Int32),
    "micro_ros_arduino_node_publisher");

  // Publication duration measurement publisher init:
  rclc_publisher_init_default(
    &pub_latency,
    &node,
    ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, Float32),
    "pub_duration");

  // create timer,
  timer = rcl_get_zero_initialized_timer();
  const unsigned int timer_timeout = 100;
  rclc_timer_init_default(
    &timer,
    &support,
    RCL_MS_TO_NS(timer_timeout),
    timer_callback);

  // create executor
  executor = rclc_executor_get_zero_initialized_executor();
  rclc_executor_init(&executor, &support.context, 1, &allocator);

  unsigned int rcl_wait_timeout = 1;   // in ms
  rclc_executor_set_timeout(&executor, RCL_MS_TO_NS(rcl_wait_timeout));
  rclc_executor_add_timer(&executor, &timer);

  msg.data = 0;
}

void loop() {
  rclc_executor_spin_some(&executor, RCL_MS_TO_NS(100));
}

This code is uploaded to my Teensy board. I run a micro-ROS agent: ros2 run micro_ros_agent micro_ros_agent serial --dev /dev/teensy And I check out the duration topic: ros2 topic echo /pub_duration With this .ino code, I measure a duration between 8ms and 13ms.

Questions

First question (which may be naïve): is this an expected behavior? If so, it might be a real problem for us, since we absolutely need to control our loop rate. With only two rcl_publish() per loop in our own code, the actual loop rate is twice inferior to our target rate (ideally 50Hz). If this is inherently due to the rcl_publish() function, is there another and "faster" way to advertise topics using micro-ros-arduino?

Maybe this is linked to the micro-ROS agent I am running. Do you think this could be the case? How can I check this?

I was able to measure time only with Arduino functions (as micros()). May be there is a more reliable way to do so. For instance I tried functions in time.h, as clock_gettime(), but it failed (clock_gettime() was not successfully declared when including time.h).

I also tried to reduce the executor timeout, but results are the same.

Thanks for helping

pablogs9 commented 3 years ago

Hello @anaelle-sw, I have reproduced your test and I also have results between 8 ms and 13 ms using the rclc_publisher_init_default.

If I change to rclc_publisher_init_best_effort in my machine it dramatically decreases to (a really constant) 42 us.

Let me explain this: when you create a publisher with the default QoS you are creating a reliable ROS2-DDS publisher and also you are telling the micro-ROS middleware (XRCE) that it should talk reliably with the micro-ROS Agent for this publisher.

If we check the agent output for an unmodified version of the int32 publisher we can see something like:

[1604669776.121856] debug    | SerialAgentLinux.cpp | recv_message             | [==>> SER <<==]        | client_key: 0x50A5B13C, len: 13, data:                                                                                                                                          
0000: 81 00 00 00 0B 01 05 00 14 00 14 00 80                                                                                                                                                                                                                                              
[1604669776.122024] debug    | DataWriter.cpp     | write                    | [** <<DDS>> **]        | client_key: 0x00000000, len: 4, data:                                                                                                                                             
0000: 11 00 00 00                                                                                                                                                                                                                                                                         
[1604669776.122244] debug    | SerialAgentLinux.cpp | send_message             | [** <<SER>> **]        | client_key: 0x50A5B13C, len: 13, data:                                                                                                                                          
0000: 81 00 00 00 0A 01 05 00 15 00 00 00 80                                                                                                                                                                                                                                              
[1604669776.122318] debug    | SerialAgentLinux.cpp | send_message             | [** <<SER>> **]        | client_key: 0x50A5B13C, len: 13, data:                                                                                                                                          
0000: 81 00 00 00 0A 01 05 00 15 00 00 00 80                                                                                                                                                                                                                                              
[1604669776.123069] debug    | SerialAgentLinux.cpp | recv_message             | [==>> SER <<==]        | client_key: 0x50A5B13C, len: 13, data:                                                                                                                                          
0000: 81 00 00 00 0B 01 05 00 14 00 14 00 80                                                                                                                                                                                                                                              
[1604669776.123397] debug    | SerialAgentLinux.cpp | send_message             | [** <<SER>> **]        | client_key: 0x50A5B13C, len: 13, data:                                                                                                                                          
0000: 81 00 00 00 0A 01 05 00 15 00 00 00 80                                                                                                                                                                                                                                              
[1604669776.127683] debug    | SerialAgentLinux.cpp | recv_message             | [==>> SER <<==]        | client_key: 0x50A5B13C, len: 13, data:                                                                                                                                          
0000: 81 00 00 00 0B 01 05 00 14 00 14 00 80                                                                                                                                                                                                                                              
[1604669776.128013] debug    | SerialAgentLinux.cpp | send_message             | [** <<SER>> **]        | client_key: 0x50A5B13C, len: 13, data:                                                                                                                                          
0000: 81 00 00 00 0A 01 05 00 15 00 00 00 80                                                                                                                                                                                                                                              
[1604669777.130678] debug    | SerialAgentLinux.cpp | recv_message             | [==>> SER <<==]        | client_key: 0x50A5B13C, len: 16, data:                                                                                                                                          
0000: 81 80 15 00 07 01 08 00 00 20 00 05 12 00 00 00                                                                                                                                                                                                                                     
[1604669777.130793] debug    | SerialAgentLinux.cpp | recv_message             | [==>> SER <<==]        | client_key: 0x50A5B13C, len: 13, data:                                                                                                                                          
0000: 81 00 00 00 0B 01 05 00 15 00 15 00 80                                                                                                                                                                                                                                              
[1604669777.130893] debug    | SerialAgentLinux.cpp | recv_message             | [==>> SER <<==]        | client_key: 0x50A5B13C, len: 13, data: 

Here we can see how the client sends a serial message to the agent, the agent sends the message to DDS and then they exchange some other packets in order to confirm that this message has been published to DDS. This way the client ensures that its message has been correctly published.

If we change to rclc_publisher_init_best_effort we can see how the client just sends its topic and the agent put it in the ROS2 DDS world. No extra confirmations (and time) are required.

[1604669873.835568] debug    | SerialAgentLinux.cpp | recv_message             | [==>> SER <<==]        | client_key: 0x50A5B13C, len: 16, data:
0000: 81 01 11 00 07 01 08 00 00 1F 00 05 11 00 00 00
[1604669873.835789] debug    | DataWriter.cpp     | write                    | [** <<DDS>> **]        | client_key: 0x00000000, len: 4, data:

Usually, when dealing with fast sensor data this best-effort mode is the recommended option. In your use case, you need a sub-microsecond reliable communication? Or it is enough with a best-effort approach?

We have not done the calculus but the rcl_publish function throughput when using a best-effort QoS should be near to the channel maximum rate. Here, we are sending a 16 B message through XRCE, 4 of them are the payload and 12 B the XRCE headers (if you increase the payload the ratio header/payload will be better).

We are also using a 115200 bauds serial port, so we are able to transmit 115 kb/s. In the UART probably an 8N1 mode will have an 8 bits payload for every 10 bits transmitted, so:

theoretical_throughput = 115.2 kb/s * (8/10)  = 92.16 kb/s = 11.52 kB/s
theoretical_tx_delay_best_effort = 16 B / 11.52 kB/s = 128 b / 92.16 kb/s = 1.39 ms

In my test, the function is taking much less, I have two hypotheses:


One last thing! When you create best effort publishers in micro-ROS you are telling the client to use a best-effort communication with the agent but also you are using a best-effort publisher in DDS-ROS2. This can be easily changed to support all combinations but we haven't done it yet because there is no use case. Let me know if you need this such fine-grain configuration.

pablogs9 commented 3 years ago

Another check. If I substitute the loop in the int32 example with the best effort publisher:

void loop() {
  rcl_publish(&publisher, &msg, NULL);
  msg.data++;
}

I have this output:

root@pgarrido:/# ros2 topic hz /micro_ros_arduino_node_publisher
average rate: 5071.624
        min: 0.000s max: 0.006s std dev: 0.00010s window: 5074

I guess that somehow this Teensy USB transport is not taking into account the 115200 baud rate because:

throughput_payload = 5071.624 messages/s * 32 b/s = 162291.97 b/s ≈ 20.3 kB/s
throughput_total= 5071.624 messages/s * 128 b/s = 649167.87 b/s ≈ 81.3 kB/s

Same test with the reliable publisher:

root@pgarrido:/# ros2 topic hz /micro_ros_arduino_node_publisher
average rate: 103.761
        min: 0.006s max: 0.019s std dev: 0.00272s window: 105
throughput_payload = 5071.624 messages/s * 32 b/s = 3320.35 b/s ≈ 0.4 kB/s
throughput_total= 5071.624 messages/s * 128 b/s = 13281.40 b/s ≈ 1.7 kB/s
anaelle-sw commented 3 years ago

Ok, I understand now why it takes "so much" time to publish messages with a reliable QoS policy. Thanks for these explanations!

The solution consisting in changing the QoS publisher to best effort works really nice for us: our application is executing at 50Hz, just as we need it to. This publication corresponds to fast sensor data, as you mentioned. We don't need a reliable policy for it, best effort will suit us perfectly.

Let me know if you need this such fine-grain configuration.

--> It won't be necessary for now. But we will surely need to resize the stream buffer at some point. Is this something we can set via QoS profiles? Are the micro-ROS tutorials about QoS and about middleware configuration suitable for micro-ros-arduino? (Actually I was just about opening another issue about these QoS questionings. May be we can close this issue and discuss more properly about QoS in a new one, the way you prefer)

Good to know Teensy USB doesn't follow the 115200 baud rate...

Edit: Actually using best-effort QoS policy has one major drawback. The topics advertised in best-effort cannot be checked out via rqt (since there is no way to set QoS for rqt listener, issue opened here). But this is not micro-ros-arduino related.

pablogs9 commented 3 years ago

Yes, both tutorials are suitable for the Arduino port of micro-ROS.

The point is that you will need to rebuild the Arduino library (same as here) with a modified version of the colcon.meta file.

In order to use labeled QoS from the agent-side XML file (as explained in the tutorial) you will need to set here the "-DRMW_UXRCE_CREATION_MODE=refs" option. By default it is "-DRMW_UXRCE_CREATION_MODE=xml" for generating the entities XML in the client side.

And of course, you can modify the other configurations, as explained here in the same colcon.meta file and rebuild you custom Arduino library.

One caveat is that currently, you can't have both XML and Refs modes in the same build. So if you set your build to Refs you will need to have a label in the agent-side QoS XML file for each entity. This can be tricky sometimes. For sure it is possible to upgrade the middleware library to support both of them simultaneously. If you need this feature we can talk and schedule a period for implementing it.

I'm closing. Please open a new issue if you need support for the QoS or middleware customization.