micro-ROS / micro-ROS-Agent

ROS 2 package using Micro XRCE-DDS Agent.
Apache License 2.0
101 stars 58 forks source link

Message loss still appear when use RELIABLE QoS #146

Closed scottitran closed 2 years ago

scottitran commented 2 years ago

Currently, I am testing reliability of Micro-ROS, I used basic pub and sub program. I used Micro-ROS in ESP32 and I have set reliability QoS to RELIABLE, durability to TRANSIENT_LOCAL and history to KEEP_ALL but during testing, the program still get lost messages. Can someone please let me know why this happend?

pablogs9 commented 2 years ago

Could you provide steps for replicating this issue?

scottitran commented 2 years ago

I have two ESP32 board and I placed publisher in one ESP32 and subscriber to the other one to communication using UDP. I created custom message which include count int32 and int32[] data. Then when a message is transmitted, count will +1. Besides, I also put another count at subscriber, so whenever subscriber receive message, I can compare these values toghether

pablogs9 commented 2 years ago

I need:

scottitran commented 2 years ago

Publisher code:

#include <stdio.h>
#include <unistd.h>

#include <rcl/rcl.h>
#include <rcl/error_handling.h>
#include <std_msgs/msg/int32.h>
#include <tutorial_interfaces/msg/num.h>

#include <rclc/rclc.h>
#include <rclc/executor.h>

#ifdef ESP_PLATFORM
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#endif

#define RCCHECK(fn) { rcl_ret_t temp_rc = fn; if((temp_rc != RCL_RET_OK)){printf("Failed status on line %d: %d. Aborting.\n",__LINE__,(int)temp_rc);vTaskDelete(NULL);}}
#define RCSOFTCHECK(fn) { rcl_ret_t temp_rc = fn; if((temp_rc != RCL_RET_OK)){printf("Failed status on line %d: %d. Continuing.\n",__LINE__,(int)temp_rc);}}

rcl_publisher_t publisher;
// std_msgs__msg__Int32 msg;
tutorial_interfaces__msg__Num msg;
int count = 0;

void timer_callback(rcl_timer_t * timer, int64_t last_call_time)
{
    RCLC_UNUSED(last_call_time);

    msg.count = count;
    if (timer != NULL) {
        RCSOFTCHECK(rcl_publish(&publisher, &msg, NULL));
        printf("pub send seq %d\n", msg.count);
        count++;
    }
}

void appMain(void * arg)
{
    rcl_allocator_t allocator = rcl_get_default_allocator();
    rclc_support_t support;

    // create init_options
    RCCHECK(rclc_support_init(&support, 0, NULL, &allocator));

    // create node
    rcl_node_t node;
    rcl_node_options_t node_ops = rcl_node_get_default_options();
    node_ops.domain_id = 4;
    RCCHECK(rclc_node_init_with_options(&node, "freertos_int32_publisher", "", &support, &node_ops));

    // create publisher
    rmw_qos_profile_t pub_qos = rmw_qos_profile_default;
    pub_qos.reliability = RMW_QOS_POLICY_RELIABILITY_RELIABLE;
    pub_qos.history = RMW_QOS_POLICY_HISTORY_KEEP_ALL;
    // pub_qos.depth = 1000;
    pub_qos.durability = RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL;

    RCCHECK(rclc_publisher_init(
        &publisher,
        &node,
        ROSIDL_GET_MSG_TYPE_SUPPORT(tutorial_interfaces, msg, Num),
        "freertos_int32_publisher",
        &pub_qos));

    // create timer,
    rcl_timer_t timer;
    const unsigned int timer_timeout = 100.0;
    RCCHECK(rclc_timer_init_default(
        &timer,
        &support,
        RCL_MS_TO_NS(timer_timeout),
        timer_callback));

    // create executor
    rclc_executor_t executor;
    RCCHECK(rclc_executor_init(&executor, &support.context, 1, &allocator));
    RCCHECK(rclc_executor_add_timer(&executor, &timer));

    // msg.data.capacity = 10000;
    // msg.data.data = (int32_t*) malloc(msg.data.capacity * sizeof(int32_t));
    // msg.data.size = 0;
    static int32_t memory[30];
    msg.data.capacity = 30;
    msg.data.data = memory;
    msg.data.size = 0;

    for (int32_t i =0; i < 30; i++){    
        msg.data.data[i] = 1;//rand();
        msg.data.size++;
    }

    rclc_executor_spin(&executor);

    // free resources
    RCCHECK(rcl_publisher_fini(&publisher, &node));
    RCCHECK(rcl_node_fini(&node));
    free(msg.data.data);

    vTaskDelete(NULL);
}

include <std_msgs/msg/int32.h>

include <tutorial_interfaces/msg/num.h>

include

include

ifdef ESP_PLATFORM

include "freertos/FreeRTOS.h"

include "freertos/task.h"

endif

define RCCHECK(fn) { rcl_ret_t temp_rc = fn; if((temp_rc != RCL_RET_OK)){printf("Failed status on line %d: %d. Aborting.\n",LINE,(int)temp_rc);vTaskDelete(NULL);}}

define RCSOFTCHECK(fn) { rcl_ret_t temp_rc = fn; if((temp_rc != RCL_RET_OK)){printf("Failed status on line %d: %d. Continuing.\n",LINE,(int)temp_rc);}}

rcl_subscription_t subscriber; tutorial_interfacesmsgNum msg; int count = 0;

void subscription_callback(const void msgin) { const tutorial_interfacesmsgNum msg = (const tutorial_interfacesmsgNum *)msgin; printf("msg: %d, count: %d\n", msg->count, count); count++; }

void appMain(void * arg) { rcl_allocator_t allocator = rcl_get_default_allocator(); rclc_support_t support;

// create init_options
RCCHECK(rclc_support_init(&support, 0, NULL, &allocator));

// create node
rcl_node_t node;
rcl_node_options_t node_ops = rcl_node_get_default_options();
node_ops.domain_id = 4;
RCCHECK(rclc_node_init_with_options(&node, "int32_subscriber_rclc", "", &support, &node_ops));

// create subscriber
rmw_qos_profile_t sub_qos = rmw_qos_profile_default;
sub_qos.reliability = RMW_QOS_POLICY_RELIABILITY_RELIABLE;
sub_qos.history = RMW_QOS_POLICY_HISTORY_KEEP_ALL;
sub_qos.durability = RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL;
// sub_qos.depth = 1000;

RCCHECK(rclc_subscription_init(
    &subscriber,
    &node,
    ROSIDL_GET_MSG_TYPE_SUPPORT(tutorial_interfaces, msg, Num),
    "freertos_int32_publisher",
    &sub_qos));

// create executor
rclc_executor_t executor;
RCCHECK(rclc_executor_init(&executor, &support.context, 1, &allocator));
RCCHECK(rclc_executor_add_subscription(&executor, &subscriber, &msg, &subscription_callback, ON_NEW_DATA));

// msg.data.capacity = 10000;
// msg.data.data = (int32_t*) malloc(msg.data.capacity * sizeof(int32_t));
// msg.data.size = 0;

static int32_t memory[30];
msg.data.capacity = 30;
msg.data.data = memory;
msg.data.size = 0;

rclc_executor_spin(&executor);

// free resources
RCCHECK(rcl_subscription_fini(&subscriber, &node));
RCCHECK(rcl_node_fini(&node));
free(msg.data.data);

vTaskDelete(NULL);

}


- colcon.meta of Subscriber

{ "names": {

    "microxrcedds_client":{
        "cmake-args": [
            "-DUCLIENT_UDP_TRANSPORT_MTU=2048"
        ]
    },

    "rmw_microxrcedds": {
        "cmake-args": [
            "-DRMW_UXRCE_MAX_NODES=2",
            "-DRMW_UXRCE_MAX_PUBLISHERS=2",
            "-DRMW_UXRCE_MAX_SUBSCRIPTIONS=2",
            "-DRMW_UXRCE_MAX_SERVICES=5",
            "-DRMW_UXRCE_MAX_CLIENTS=0",
            "-DRMW_UXRCE_STREAM_HISTORY=10",
            "-DRMW_UXRCE_MAX_HISTORY=2"
        ]
    }
}

}



-  I used Foxy
-  I create agent based on this tutorial: https://micro.ros.org/docs/tutorials/core/first_application_linux/. This mean I run agent on my Linux PC not from Docker
pablogs9 commented 2 years ago
pablogs9 commented 2 years ago

Some proposals:

scottitran commented 2 years ago
pablogs9 commented 2 years ago
  • What is the difference between the default setup with custom QoS as I used because I want to test all QoS policy types.

Default QoS are defined in rmw: https://github.com/ros2/rmw/blob/master/rmw/include/rmw/qos_profiles.h#L51

  • I only count in the beginning and at the end when I measure. Here is the result: with 128 Bytes I got zero message loss, but with 1 Kilobyte, total message expect to receive is 2656 but total message actual receive only 2653 (during a few minutes) and I also test during 1 hour, total message expect to receive is 41294 and only receive 41244 (around 0.12% message loss rate)

It seems that you are having problems with fragmentation with big messages, see next comment

  • Because I want to increase message size

If you want to increase the message size, do not increase the MTU. I recommend keeping MTU to 512 B and increasing RMW_UXRCE_STREAM_HISTORY to a greater value. This way you will have more slots of 512 B to handle message fragmentation. RMW_UXRCE_STREAM_HISTORY shall be a power of two, so maybe 16 is a good starting value for your proposals.

  • Ros2 topic echo had the same value as the actual receive message that I mentioned above

I do not understand your answer here, ROS2 echo command receives all the messages or also experiments the losses. This is important to determine if your data loss is:

pablogs9 commented 2 years ago

Also, take this into account:

    while(1){
        rclc_executor_spin_some(&executor, RCL_MS_TO_NS(100));
        usleep(10000);
    }

By doing a rclc_spin() you are blocking FreeRTOS tasks with lower priority than the micro-ROS one.

scottitran commented 2 years ago

I tried to change as you mentioned and I still get lost messages. image.

This was captured from ros2 topic echo: Screenshot from 2022-06-08 11-08-48

"msg: 7236" is the total message that I expect to receive and "count: 7233" is the actual message receive

Data that is not being published correctly from the first ESP32 -> Yes, but do you might know what caused this to happen?

pablogs9 commented 2 years ago

Could you explain which of the changes you have tried?

scottitran commented 2 years ago

I changed rclc_executor_spin to rclc_executor_spin_some in the while loop as you mentioned. and RMW_UXRCE_STREAM_HISTORY = 16 and MTU = 512. I also cleaned build before uploading code to ESP32

pablogs9 commented 2 years ago
scottitran commented 2 years ago

Yes, I can try it and I will let you know if message loss still appears but it will be too slow. Besides, did the publish rate affect to message loss rate?

pablogs9 commented 2 years ago

A high pub/sub rate can collapse the internal middleware buffers in micro-ROS

pablogs9 commented 2 years ago

Any update on this?

scottitran commented 2 years ago

Hi sorry, I forgot to update this issue. Yes, when I decrease the publisher rate to 1 Hz, no lost message anymore. However, when I increase publisher rate, message loss appear again.

A high pub/sub rate can collapse the internal middleware buffers in micro-ROS

A question, where I can read about this or this is an issue with Micro-ROS?

pablogs9 commented 2 years ago

It is not an issue, it is just the nature of a full-static middleware. If it has no memory, it cannot store new data.

You can read more about the micro-ROS middleware here: https://micro-xrce-dds.docs.eprosima.com/en/latest/

In order to solve this issue just tune your balance between RMW_UXRCE_STREAM_HISTORY and rates.

scottitran commented 2 years ago

Yes, thank you very much for your support