Closed dejanpan closed 2 years ago
Based on https://discourse.ros.org/t/ros-2-real-time-working-group-online-meeting-18-may-26-2020-meeting-minutes/14302/14 & https://gist.github.com/y-okumura-isp/8c03fa6a59ce57533159c7e3e7917999, Apex.AI analyzed the comparison criteria and the three different tools in this analysis
In this test we are running the Performance Test provided by Apex.AI. Right now we have our own fork because there are some pending pull requests in the official gitlab repository.
Proposal for next steps: For the future, Apex.AI will try to solve this problem:
performance_test
tool, there is already a newcomer who will be mainly focusing on this topicApex.AI would like to understand more how this test is different from Apex.AI's performance_test
tool to understand the gaps in the tool:
performance_test
supports all metrics in https://linux.die.net/man/2/getrusage so we would like to get more clarification on the additional metrics and how they are calculated and if the common metrics are calculated differently (i.e. CPU) Hello @cottsay, @dejanpan mentioned that you can support regarding buildfarm_perf_tests
so please comment on the above if you can.
Apex.AI shall consider supporting the use case of measuring a node overhead.
Proposal for next steps:
buildfarm_perf_test
g[3] & performance_test
g[1]. It seems it's possible to do so and that will bring the benefit of having a standard single evaluation platform, increasing the utilization of both tools and bringing the tools to maturity with larger use casesThe tool is mature, it has around 290 commits and around 21 forks. It is inspired by the performance_test
tool
ApexAI provides an alternative valid performance evaluation framework, which allows testing different type of messages. Our implementation is inspired by their work.
Additionally, to the already open-source performance_test
tool, Apex.AI has internally another performance testing tool called test_bench
to test a running system that scales to dimensions of a real application, meaning about one hundred nodes with various loads of messages being passed between them. The tool is very similar to iRobot ros2-performance.
The tool is still under evaluation and Apex.AI is planning to release the tool in 2021.
Apex.AI's test-bench doesn't support:
Apex.AI supports the following features over iRobot's tool
yml
files which seems easier to manage over the json
configurations in iRobot's tool (lots more repetition than Apex.AI's yml
files)test_bench
allows publishers to run at a fixed frequency, but also has the option to publish once after each subscriber in the node receives a message.iRobot's definition for testing latencies is
Message classifications by their latency A message is classified as too_late when its latency is greater than min(period, 50ms), where period is the publishing period of that particular topic. A message is classified as late if it's not classified as too_late but its latency is greater than min(0.2*period, 5ms). The idea is that a real system could still work with a few late messages but not too_late messages. A lost message is a message that never arrived.
Apex.AI's definition is
Allowing the latency to be within the period (less than 1/frequency)
Proposal for next steps:
test_bench
, Apex.AI shall consider merging the test_bench
with performance_test
to have a single common performance evaluation tool test_bench
with performance_test
. Apex.AI will evaluate internally test_bench
and compare the usability with iRobot's performance tool and then decide how to proceed (planned in 2021) Conclusion: pendulum_control shall not be considered.
@fadi-labib Thank you for your comment and consideration. I said I was going to post a follow-up article in discourse, and this is the one. I'm sorry to late. As I first tried to post in ROS discourse, it is a please forgive me for the long comment.
We are surveying ROS2 measurement tools, especially in terms of Real-Time. We have compared some existing tools and want to share the results. As following, we found out there are some differences between tools. We hope this gives hints to consider the measurement conditions and settings.
For the complete comparison table, please see https://gist.github.com/y-okumura-isp/8c03fa6a59ce57533159c7e3e7917999. "No1" etc. in this post means the row number of this table. This post is a follow-up post of my 2020/09/01 real-time-wg talk.
In our comparison table, function comparison is very large. So we describe this table mainly.
We survey the following projects:
Each tool has at least one publisher and subscriber. The publisher wakes periodically and sends a topic, and sleeps again. They measure the performances of programs such as topic trip time and OS resources such as CPU usage.
We describe the functional similarities and differences of each tool in this section. As ROS has the following layer structure, we describe according to this structure.
+----------------------------+
| Publisher / Subscription |
| rclcpp(Executor/Nodes) |
| DDS | ROS2 layer
+----------------------------+
+----------------------------+
| Process and RT-setting |
+----------------------------+
+----------------------------+
| HW / OS |
+----------------------------+
We describe how to read the table, followed by the explanations of each layer.
We explain how to read the comparison table. This table has the following columns.
Column name | About |
---|---|
Category | The layer of the structures such as "HW / OS" andro "Process" from bottom to top. |
Subcategory | Divide category into a few subcategories. |
name | Concrete items. |
[1] Test1, [1] Test2 | About [1]. As [1] has two type of test, we divide columns. |
[2] | About [2]. |
[3] | About [3]. |
And we use the following notations:
We describe a summary of each category below.
use_intra_process_comms_
: [1] [2] enables NodeOptions.use_intra_process_comms
. But communication optimizations are also implemented in other layers, for example, zero-copy in some DDS. It looks good for programmers to know what to use for each situation (so we need tests for each situation).rclcpp::Publisher::publish(std::unique_ptr<MessageT, MessageDeleter> msg)
.ROS2 communication optimization
There are some communication optimizations, and each needs a specific situation.
For example, we cannot use intra_process_comms
for inter process communication.
There are at least four patterns of the relationship between a publisher and subscribers, and we have to check which optimizations we can use for each situation.
- There are many types of relations between Pub & Subs. The below figure shows:
- Sub1: Same process and same Node with Pub.
There are some optimizations: Node intra_process_comm, sharing pointer, DDS zero-copy, and so on.
- Sub2: Same process but different Node with Pub
We may use the same optimization as Sub1.
- Sub3: Different process but same host with Pub
Some DDS may optimize this type of communication.
- Sub4: Different host with Pub
Communication over the network.
+-----+ +------+ +------+ +----------+ +----------+
| Pub | | Sub1 | | Sub2 | | Sub3 | | Sub4 |
+-----+ +------+ +------+ +----------+ +----------+
+---------------+ +------+ +----------+ +----------+
| Node | | Node | | Node | | Node | <- rclcpp has intra_process_comms_
+---------------+ +------+ +----------+ +----------+
+------------------------+ +----------+ +----------+
| Executor | | Executor | | Executor |
+------------------------+ +----------+ +----------+
+------------------------+ +----------+ +----------+
| DDS | | DDS | | DDS | <- some DDS support efficient communication such as intra process or shm
+------------------------+ +----------+ +----------+
+------------------------+ +----------+ +----------+
| Process | | Process | | Process |
+------------------------+ +----------+ +----------+
+-------------------------------------+ +----------+
| Host1 | | Host2 |
+-------------------------------------+ +----------+
rclcpp::Publisher::publish
rcl_serialized_message
and can_loan_messages()
. If they are appropriate for real-time, we have to test them.Testbench to generate use-cases:
Generation of use-cases Frontend for rclc-Execuotor: (Executor with C-API) from Bosch
Frontend for Static Executor ROS 2 (Executor with C++ API) from Nobleo
I will summarize the discussion up to this point:
Define metrics:
Open questions:
For the moment we will use the following benchmarking tools:
We currently the following selection of tools available:
Acceptance Criteria