ros-realtime / community

WG governance model & list of projects
27 stars 2 forks source link

Consolidate Performance Test Tools for ROS 2 #6

Closed dejanpan closed 2 years ago

dejanpan commented 4 years ago

We currently the following selection of tools available:

  1. https://discourse.ros.org/t/ros-2-real-time-working-group-online-meeting-18-may-26-2020-meeting-minutes/14302/14?u=dejan_pangercic
    1. [g1] performance_test (Apex) https://gitlab.com/ApexAI/performance_test
    2. [g2] iRobot ros2-performance (iRobot) https://github.com/irobot-ros/ros2-performance 1
    3. [g3] buildfarm_perf_test (OpenRobotics) https://github.com/ros2/buildfarm_perf_tests
  2. Every DDS vendor has a perf testing tool, e.g.
    1. https://github.com/rticommunity/rtiperftest
  3. https://github.com/rticommunity/rtiperftest

Acceptance Criteria

  1. [x] Consolidate the requirements from:
    1. https://discourse.ros.org/t/ros-2-real-time-working-group-online-meeting-18-may-26-2020-meeting-minutes/14302/14?u=dejan_pangercic (at the tail)
    2. https://drive.google.com/file/d/15nX80RK6aS8abZvQAOnMNUEgh7px9V5S/view => Standardized experiment setup
  2. [x] Compare above tools
  3. [x] Enhance one of the tools => perfomance_test was improved, referece_system was created to complement other use cases
fadi-labib commented 4 years ago

Based on https://discourse.ros.org/t/ros-2-real-time-working-group-online-meeting-18-may-26-2020-meeting-minutes/14302/14 & https://gist.github.com/y-okumura-isp/8c03fa6a59ce57533159c7e3e7917999, Apex.AI analyzed the comparison criteria and the three different tools in this analysis

  1. CI: buildfarm_perf_tests + Apex.AI performance_test
  2. iRobot ros2-performance
  3. pendulum_control

[1] CI: buildfarm_perf_tests + Apex.AI performance_test

Test 1

In this test we are running the Performance Test provided by Apex.AI. Right now we have our own fork because there are some pending pull requests in the official gitlab repository.

Proposal for next steps: For the future, Apex.AI will try to solve this problem:

  1. Apex.AI will allocate more resources to maintain the performance_test tool, there is already a newcomer who will be mainly focusing on this topic
  2. If this doesn't fix the problem, Apex.AI will open up the tool for more maintainers from the community

Test 2

Apex.AI would like to understand more how this test is different from Apex.AI's performance_test tool to understand the gaps in the tool:

  1. We understand probably because it can use two different RMW implementations so it required some tweaks to run two different instances
  2. Also, we would like to understand if the metrics measured in the test (Average round trip, CPU usage (read from the filesystem), Total lost packets, Received/Sent packets per second, Physical memory, Resident anonymous memory, Virtual memory) are calculated in a way different than how Apex.AI performance testing does? Apex.AI uses https://gitlab.com/ApexAI/performance_test/-/blob/master/performance_test/src/experiment_execution/analysis_result.cpp, performance_test supports all metrics in https://linux.die.net/man/2/getrusage so we would like to get more clarification on the additional metrics and how they are calculated and if the common metrics are calculated differently (i.e. CPU)

Hello @cottsay, @dejanpan mentioned that you can support regarding buildfarm_perf_tests so please comment on the above if you can.

Test 3

Apex.AI shall consider supporting the use case of measuring a node overhead.

Proposal for next steps:

  1. Apex.AI shall work with OpenRobotics in consolidating buildfarm_perf_test g[3] & performance_test g[1]. It seems it's possible to do so and that will bring the benefit of having a standard single evaluation platform, increasing the utilization of both tools and bringing the tools to maturity with larger use cases

[2] iRobot ros2-performance

The tool is mature, it has around 290 commits and around 21 forks. It is inspired by the performance_test tool

ApexAI provides an alternative valid performance evaluation framework, which allows testing different type of messages. Our implementation is inspired by their work.

Additionally, to the already open-source performance_test tool, Apex.AI has internally another performance testing tool called test_bench to test a running system that scales to dimensions of a real application, meaning about one hundred nodes with various loads of messages being passed between them. The tool is very similar to iRobot ros2-performance.

The tool is still under evaluation and Apex.AI is planning to release the tool in 2021.

Apex.AI's test-bench doesn't support:

  1. Measuring service discovery time
  2. Using services
  3. Building the topology manually https://github.com/irobot-ros/ros2-performance/tree/master/performances/performance_test_factory#manually-create-ros2-nodes

Apex.AI supports the following features over iRobot's tool

  1. Persisting the results in a database using ODB
  2. Configuration using yml files which seems easier to manage over the json configurations in iRobot's tool (lots more repetition than Apex.AI's yml files)
  3. iRobot's tool doesn't appear to support a simulated work time; publishers are run at a fixed frequency only. Apex.AI's test_bench allows publishers to run at a fixed frequency, but also has the option to publish once after each subscriber in the node receives a message.
  4. Apex.AI tool supports changing the QoS settings and the threading settings which is not clear if iRobot's does

iRobot's definition for testing latencies is

Message classifications by their latency A message is classified as too_late when its latency is greater than min(period, 50ms), where period is the publishing period of that particular topic. A message is classified as late if it's not classified as too_late but its latency is greater than min(0.2*period, 5ms). The idea is that a real system could still work with a few late messages but not too_late messages. A lost message is a message that never arrived.

Apex.AI's definition is

Allowing the latency to be within the period (less than 1/frequency)

Proposal for next steps:

  1. RTWG shall come up with a common definition for latency
  2. After Apex.AI's evaluation of the test_bench, Apex.AI shall consider merging the test_bench with performance_test to have a single common performance evaluation tool
  3. Besides merging test_bench with performance_test. Apex.AI will evaluate internally test_bench and compare the usability with iRobot's performance tool and then decide how to proceed (planned in 2021)
  4. Based on the analysis, Apex.AI shall discuss with iRobot the different possibilities, available options are merging the two tools, keeping the two tools independent, adding the missing features to one of them or other options to be discussed

[3] pendulum_control

  1. It seems the toot doesn't add much value over the other tools
  2. It seems the tool doesn't have big updates since 2016 so a bit aged
  3. The usage of malloc and printing stack backtrace, this is similar to the approach used by https://github.com/osrf/osrf_testing_tools_cpp
  4. The page faults collection seems to be the same approach used by performance_test (performance_test, pendulum_control/rttest)
  5. In https://gist.github.com/y-okumura-isp/8c03fa6a59ce57533159c7e3e7917999#metrics-comparison-table-resource, there is a jitter metric in pendulum_control, this can be calculated easily from the latencies collected by [1] & [2]

Conclusion: pendulum_control shall not be considered.

y-okumura-isp commented 4 years ago

@fadi-labib Thank you for your comment and consideration. I said I was going to post a follow-up article in discourse, and this is the one. I'm sorry to late. As I first tried to post in ROS discourse, it is a please forgive me for the long comment.

Preface

We are surveying ROS2 measurement tools, especially in terms of Real-Time. We have compared some existing tools and want to share the results. As following, we found out there are some differences between tools. We hope this gives hints to consider the measurement conditions and settings.

For the complete comparison table, please see https://gist.github.com/y-okumura-isp/8c03fa6a59ce57533159c7e3e7917999. "No1" etc. in this post means the row number of this table. This post is a follow-up post of my 2020/09/01 real-time-wg talk.

In our comparison table, function comparison is very large. So we describe this table mainly.

Target Projects

We survey the following projects:

Each tool has at least one publisher and subscriber. The publisher wakes periodically and sends a topic, and sleeps again. They measure the performances of programs such as topic trip time and OS resources such as CPU usage.

Program structure

We describe the functional similarities and differences of each tool in this section. As ROS has the following layer structure, we describe according to this structure.

+----------------------------+
|  Publisher / Subscription  |
|  rclcpp(Executor/Nodes)    |
|  DDS                       |   ROS2 layer
+----------------------------+

+----------------------------+
|  Process and RT-setting    |
+----------------------------+

+----------------------------+
|  HW / OS                   |
+----------------------------+

We describe how to read the table, followed by the explanations of each layer.

How to read the table

We explain how to read the comparison table. This table has the following columns.

Column name About
Category The layer of the structures such as "HW / OS" andro "Process" from bottom to top.
Subcategory Divide category into a few subcategories.
name Concrete items.
[1] Test1, [1] Test2 About [1]. As [1] has two type of test, we divide columns.
[2] About [2].
[3] About [3].

And we use the following notations:

We describe a summary of each category below.

HW / OS

Process

RT setting

DDS

rclcpp

Communication detail

ROS2 communication optimization

There are some communication optimizations, and each needs a specific situation. For example, we cannot use intra_process_comms for inter process communication. There are at least four patterns of the relationship between a publisher and subscribers, and we have to check which optimizations we can use for each situation.

- There are many types of relations between Pub & Subs. The below figure shows:
  - Sub1: Same process and same Node with Pub.
          There are some optimizations: Node intra_process_comm, sharing pointer, DDS zero-copy, and so on.
  - Sub2: Same process but different Node with Pub
          We may use the same optimization as Sub1.
  - Sub3: Different process but same host with Pub
          Some DDS may optimize this type of communication.
  - Sub4: Different host with Pub
          Communication over the network.

 +-----+  +------+ +------+ +----------+  +----------+
 | Pub |  | Sub1 | | Sub2 | |   Sub3   |  |   Sub4   |
 +-----+  +------+ +------+ +----------+  +----------+
 +---------------+ +------+ +----------+  +----------+
 |     Node      | | Node | |   Node   |  |   Node   |  <- rclcpp has intra_process_comms_
 +---------------+ +------+ +----------+  +----------+
 +------------------------+ +----------+  +----------+
 |        Executor        | | Executor |  | Executor |
 +------------------------+ +----------+  +----------+
 +------------------------+ +----------+  +----------+
 |          DDS           | |   DDS    |  |   DDS    |  <- some DDS support efficient communication such as intra process or shm
 +------------------------+ +----------+  +----------+
 +------------------------+ +----------+  +----------+
 |        Process         | | Process  |  | Process  |
 +------------------------+ +----------+  +----------+
 +-------------------------------------+  +----------+
 |              Host1                  |  |  Host2   |
 +-------------------------------------+  +----------+

Publisher / callback

Subscription / callback

Other functions

Measurement

JanStaschulat commented 4 years ago

Testbench to generate use-cases:

Generation of use-cases Frontend for rclc-Execuotor: (Executor with C-API) from Bosch

Frontend for Static Executor ROS 2 (Executor with C++ API) from Nobleo

carlossvg commented 3 years ago

I will summarize the discussion up to this point:

Define metrics:

  1. timer precision
  2. communication quality
    1. topic trip-time (1-way, 2-way) stats
    2. total sent / total recv / losses
  3. Program latency
    1. PDP/EDP discovery
    2. timer jitter
    3. callback jitter
  4. Resource usage (CPU, memory, page faults, network)

Open questions:

  1. How to measure timer precision?
  2. Communication quality => This is covered by performance test tools
    • Do we need to add new features to the tools for missing metrics?
  3. Program latency. Is it possible to measure this with performance test? Does it make sense to create a new sub-project to measure this?
  4. Resource usage. There are several applications using different approaches. It would be interesting to aggregate all these utils in one single package.
    1. performance test => getrusage
    2. ros2 tracing => https://gitlab.com/ros-tracing/tracetools_analysis/-/blob/master/tracetools_analysis/launch/memory_usage.launch.py
    3. tooling code inside the package
      1. https://github.com/ros2/buildfarm_perf_tests/blob/master/src/linux_cpu_system_measurement.cpp
      2. https://gitlab.com/ApexAI/performance_test/-/blob/master/performance_test/src/utilities/cpu_usage_tracker.hpp
      3. https://gitlab.com/ApexAI/performance_test/-/blob/master/performance_test/src/utilities/qnx_res_usage.hpp
    4. https://github.com/ros-tooling/system_metrics_collector
carlossvg commented 2 years ago

For the moment we will use the following benchmarking tools: