[BUG] rclcpp publisher publish function call blocking for up to 35ms with 1 publisher100Hz and 8 subscribers topic echo.

ZhenshengLee commented 2 months ago

Bug report

Required Info:

Operating System:
- ubuntu 2204
Installation type:
- binary
Version or commit hash:
- apt latest 202404
DDS implementation:
- rmw_fastrtps_cpp
Client library (if applicable):
- rclcpp

ros2 env is using fastrtps with local only communication

ROS_VERSION=2
ROS_PYTHON_VERSION=3
ROS_DOMAIN_ID=110
ROS_HOME=/home/zs/ga_ws/ga_ros/colcon/home
ROS_LOCALHOST_ONLY=1
ROS_DISTRO=humble
RMW_FASTRTPS_USE_QOS_FROM_XML=1
RMW_IMPLEMENTATION=rmw_fastrtps_cpp
RCUTILS_COLORIZED_OUTPUT=1
FASTRTPS_DEFAULT_PROFILES_FILE=/home/zs/ga_ws/ga_ros/shm_fastdds.xml
RMW_FASTRTPS_USE_QOS_FROM_XML=1

fastdds xml is using async publishing with datasharing.

<?xml version="1.0" encoding="UTF-8"?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <transport_descriptor>
            <transport_id>UDP_transport</transport_id>
            <type>UDPv4</type>
            <maxInitialPeersRange>10</maxInitialPeersRange>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="participant_profile_ros2" is_default_profile="true">
        <rtps>
            <name>profile_for_ros2_context</name>
            <userTransports>
                <transport_id>UDP_transport</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>

            <builtin>
                <metatrafficUnicastLocatorList>
                    <locator />
                </metatrafficUnicastLocatorList>
                <initialPeersList>
                    <locator>
                        <udpv4>
                            <address>127.0.0.1</address>
                        </udpv4>
                    </locator>
                </initialPeersList>
            </builtin>
        </rtps>
    </participant>

    <data_writer profile_name="default publisher profile" is_default_profile="true">
        <qos>
            <publishMode>
                <kind>ASYNCHRONOUS</kind>
            </publishMode>
            <data_sharing>
                <kind>AUTOMATIC</kind>
            </data_sharing>
        </qos>
        <historyMemoryPolicy>PREALLOCATED_WITH_REALLOC</historyMemoryPolicy>
    </data_writer>

    <data_reader profile_name="default subscription profile" is_default_profile="true">
        <qos>
            <data_sharing>
                <kind>AUTOMATIC</kind>
            </data_sharing>
        </qos>
        <historyMemoryPolicy>PREALLOCATED_WITH_REALLOC</historyMemoryPolicy>
    </data_reader>
</profiles>

Steps to reproduce issue

run image publisher with 100Hz
run multiplex8 subscriber for example ros2 topic echo/hz/bw

ros2 run shm_msgs image_talker
ros2 topic bw /image
# 5.63 KB/s from 100 messages
# Message size mean: 0.06 KB min: 0.06 KB max: 0.0

# another 7 subscribers
ros2 topic hz /image
ros2 topic hz /image
ros2 topic hz /image
ros2 topic hz /image
ros2 topic hz /image
ros2 topic hz /image
ros2 topic hz /image

the code is such a typical image publisher

#include <chrono>
#include <cstring>
#include <memory>
#include <string>
#include <utility>

#include "rclcpp/rclcpp.hpp"

#include "sensor_msgs/msg/image.hpp"
#include <sensor_msgs/image_encodings.hpp>
#include <cv_bridge/cv_bridge.h>

#include <opencv2/opencv.hpp>

using namespace std::chrono_literals;
using namespace cv_bridge;

class Talker : public rclcpp::Node {
private:
  using Topic = sensor_msgs::msg::Image;

public:
  explicit Talker(const rclcpp::NodeOptions &options)
      : Node("image_talker", options) {

    m_input_cvimage->image = cv::imread("./res/img/205_182.png");
    // m_input_cvimage->image = cv::imread("./res/img/1024_768.jpeg");
    // m_input_cvimage->image = cv::imread("./res/img/1920_1080.jpg");
    m_input_cvimage->header.frame_id = "camera_link";
    m_input_cvimage->encoding = "bgr8";
    // cv::imshow("input image", m_input_cvimage->image);
    // cv::waitKey(0);

    auto publishMessage = [this]() -> void {
      m_input_cvimage->header.stamp = now();

      auto msg = std::make_shared<Topic>();
      // Note that msg.data is a std::array generated by the IDL compiler
      m_input_cvimage->toImageMsg(*msg);

      RCLCPP_INFO_ONCE(this->get_logger(), "Publishing with ts: %u.%u", m_input_cvimage->header.stamp.sec, m_input_cvimage->header.stamp.nanosec);

      m_start_ts = now();
      m_publisher->publish(std::move(*msg));
      auto time_offset_ms = (now() - m_start_ts).nanoseconds() / 1000000.0F;
      // RCLCPP_INFO(this->get_logger(), "publish timeout with %.3f ms", time_offset_ms);
      RCLCPP_INFO_EXPRESSION(this->get_logger(), time_offset_ms > 1.0, "publish timeout with %.3f ms", time_offset_ms);
      // We gave up ownership and loanedMsg is not supposed to be accessed
      // anymore

      m_count++;
    };

    // rclcpp::QoS qos(rclcpp::KeepLast(10));
    // rclcpp::QoS qos(rclcpp::SensorDataQoS());
    rclcpp::QoS custom_qos_profile = rclcpp::QoS(rclcpp::QoSInitialization::from_rmw(rmw_qos_profile_default))
      .history(rmw_qos_history_policy_t::RMW_QOS_POLICY_HISTORY_KEEP_LAST)
      .keep_last(5)
      .reliability(rmw_qos_reliability_policy_t::RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT)
      .durability(rmw_qos_durability_policy_t::RMW_QOS_POLICY_DURABILITY_VOLATILE)
      .avoid_ros_namespace_conventions(false);

    m_publisher = this->create_publisher<Topic>("image", custom_qos_profile);

    // Use a timer to schedule periodic message publishing.
    m_timer = this->create_wall_timer(10ms, publishMessage);
  }

private:
  uint64_t m_count = 1;
  rclcpp::Publisher<Topic>::SharedPtr m_publisher;
  rclcpp::TimerBase::SharedPtr m_timer;
  std::shared_ptr<cv_bridge::CvImage> m_input_cvimage{std::make_shared<cv_bridge::CvImage>()};
  rclcpp::Time m_start_ts{0, 0, RCL_ROS_TIME};
};

int main(int argc, char *argv[]) {
  rclcpp::init(argc, argv);
  rclcpp::NodeOptions options;
  rclcpp::spin(std::make_shared<Talker>(options));
  rclcpp::shutdown();

  return 0;
}

Expected behavior

the publish timeout log should never show, because the typical publish block time is 0.023ms.

RCLCPP_INFO_EXPRESSION(this->get_logger(), time_offset_ms > 1.0, "publish timeout with %.3f ms", time_offset_ms);

Actual behavior

with a 1min test experiment, there show many publish blocking log.

[INFO] [1712730413.187229684] [image_talker]: Publishing with ts: 1712730413.187221484
[INFO] [1712730455.772330021] [image_talker]: publish timeout with 35.392 ms
[INFO] [1712730458.092792171] [image_talker]: publish timeout with 5.929 ms
[INFO] [1712730461.802849221] [image_talker]: publish timeout with 9.390 ms
[INFO] [1712730528.413606018] [image_talker]: publish timeout with 4.516 ms
[INFO] [1712730554.978534268] [image_talker]: publish timeout with 4.702 ms
[INFO] [1712730568.429477942] [image_talker]: publish timeout with 1.766 ms
[INFO] [1712730579.609595818] [image_talker]: publish timeout with 2.998 ms
[INFO] [1712730579.637526415] [image_talker]: publish timeout with 1.362 ms
[INFO] [1712730705.313449333] [image_talker]: publish timeout with 8.309 ms
[INFO] [1712730705.337222862] [image_talker]: publish timeout with 1.668 ms
[INFO] [1712730706.387663607] [image_talker]: publish timeout with 2.402 ms
[INFO] [1712730712.672717914] [image_talker]: publish timeout with 7.183 ms
[INFO] [1712730760.985228138] [image_talker]: publish timeout with 5.793 ms
[INFO] [1712730785.781393856] [image_talker]: publish timeout with 26.846 ms
[INFO] [1712730798.727746541] [image_talker]: publish timeout with 2.861 ms
[INFO] [1712730806.782316055] [image_talker]: publish timeout with 27.989 ms
[INFO] [1712730864.781304626] [image_talker]: publish timeout with 5.097 ms
[INFO] [1712730865.357020724] [image_talker]: publish timeout with 4.860 ms
[INFO] [1712730887.712930304] [image_talker]: publish timeout with 5.525 ms
[INFO] [1712730899.765927433] [image_talker]: publish timeout with 1.542 ms
[INFO] [1712731161.108500411] [rclcpp]: signal_handler(signum=2)

Additional information

Blocking behavior
https://www.eprosima.com/index.php/resources-all/performance/dds-asynchronous-vs-synchronous-publishing The DDS standard states that the write operation may block in some specific situations, especially when the History is completely filled. Therefore, it is important to correctly configure the History management related Quality of Service (QoS) policies to prevent this from happening. This QoS policies are: History QoS Policy: how many samples are kept within the history. It could be set to KEEP ALL the samples (within the available resource limits explained next) or to KEEP LAST which keeps only a given number of samples (also known as History Depth). Resource Limits QoS Policy: this policy controls the maximum resources available. Durability QoS Policy: how long the samples are kept within the History in case that a late joiner wants to access those samples. Reliability QoS Policy: this policy specifies the delivery of the samples. If set to BEST EFFORT, the sample is sent only once and no confirmation of reception is expected whereas if this QoS is set to RELIABLE, a confirmation for all samples is expected. This latter configuration may be troublesome when sending large samples that need fragmentation because the sample’s lost fragments would have to be delivered again (with the associated network load). Without receiving the acknowledgement, the RELIABLE History is not allowed to dispose of samples. It is more common to fill the History when the publication mode is ASYNCHRONOUS, as the data is being sent in a separate thread. For example, if the user wants to send large samples and has selected the RELIABLE Reliability QoS Policy but the network is lossy, there is a high probability that the application will block for some time (the write operation keeps adding samples to the History which is not being emptied because the data is being lost in the network). In this case, it is important to know the limits of your network and set up a proper flow controller, and to tune correctly the resource limits for your application. Therefore, configuring the Quality of Service (QoS) of your DataWriters and DataReaders is paramount to prevent this blocking behavior. Finally, the SYNCHRONOUS publication mode also blocks during callbacks executions. This is important because it can cause a thread deadlock. This is the reason why it is not advisable to call any of the Fast DDS APIs within the callback methods (except for the read and take operations), because it is an undefined behavior due to race conditions and can easily cause a deadlock.

this issue can be related https://github.com/ros2/rmw_fastrtps/issues/338

QSTs

Does changing transport layer from shm to udp affect the writer blocking issue?
Any best practices/guide to set QoS to avoid this writer blocking issue?

ZhenshengLee commented 2 months ago

@MiguelCompany @Barry-Xu-2018 Friendly Ping.

ZhenshengLee commented 2 months ago

related with this performance test topic.

the latency scaling with the number of topic subscribers

MichaelOrlov commented 2 months ago

@MiguelCompany @EduPonz Could you please handle/triage this issue? The 35ms delay looks like an unreasonably long delay for a simple publish() call with the

 rclcpp::QoS custom_qos_profile = rclcpp::QoS(rclcpp::QoSInitialization::from_rmw(rmw_qos_profile_default))
      .history(rmw_qos_history_policy_t::RMW_QOS_POLICY_HISTORY_KEEP_LAST)
      .keep_last(5)
      .reliability(rmw_qos_reliability_policy_t::RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT)
      .durability(rmw_qos_durability_policy_t::RMW_QOS_POLICY_DURABILITY_VOLATILE)
      .avoid_ros_namespace_conventions(false);

@ZhenshengLee What is the delay for the same setup with the CycloneDDS?

ZhenshengLee commented 2 months ago

What is the delay for the same setup with the CycloneDDS? @MichaelOrlov

I didn't test with CycloneDDS, below is the result of RTI ConnextDDS, the publish blocking time is around 3-5ms

17131515405694

ros2 / rmw_fastrtps