ros2 / ros2cli

ROS 2 command line interface tools
Apache License 2.0
183 stars 164 forks source link

ros2 topic echo consumes more CPU resource than rostopic echo #795

Closed fujitatomoya closed 1 year ago

fujitatomoya commented 1 year ago

Bug report

Required Info:

Steps to reproduce issue

ROS version Publisher(pub) Subscription(echo) Daemon
Rolling(with daemon) 30.7% 29.3% 75.7%
Rolling 33.9% 51.5% N.A
Noetic 23.1% 35.9% N.A
root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 topic pub -r 1000 /chatter std_msgs/msg/String "data: Hello ROS!" &> /dev/null

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 topic hz /chatter
average rate: 999.958
        min: 0.001s max: 0.001s std dev: 0.00003s window: 1001
average rate: 999.987
        min: 0.001s max: 0.001s std dev: 0.00003s window: 2002
average rate: 999.985
        min: 0.001s max: 0.001s std dev: 0.00003s window: 3002
average rate: 999.989
        min: 0.001s max: 0.001s std dev: 0.00003s window: 4003
average rate: 999.991
        min: 0.001s max: 0.001s std dev: 0.00003s window: 5004

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 topic echo /chatter &> /dev/null
...

top - 15:52:44 up 9 days,  4:40,  0 users,  load average: 1.86, 1.27, 0.75
Tasks:  10 total,   3 running,   7 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.2 us,  4.0 sy,  0.0 ni, 92.5 id,  0.2 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :  31774.9 total,   1527.6 free,   6430.1 used,  23817.3 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  24698.3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 212035 root      20   0 2771124  65328  34344 R  75.7   0.2   3:29.36 python3 (ros2 daemon)
 211973 root      20   0 2771912  66360  33972 S  30.7   0.2   3:46.87 ros2 (ros2 topic pub)
 212148 root      20   0 2773084  67272  34516 R  29.3   0.2   1:09.38 ros2 (ros2 topic echo)

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 topic echo --no-daemon /chatter &> /dev/null

top - 15:55:46 up 9 days,  4:43,  0 users,  load average: 1.07, 1.17, 0.82
Tasks:  10 total,   1 running,   9 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.5 us,  0.0 sy,  0.0 ni, 92.8 id,  0.6 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :  31774.9 total,   1541.9 free,   6419.0 used,  23814.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  24712.2 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 212194 root      20   0 2771924  66468  34144 S  51.5   0.2   0:26.38 ros2 (ros2 topic echo)
 211973 root      20   0 2771912  66296  33908 S  33.9   0.2   4:54.01 ros2 (ros2 topic pub)
root@4b5f2b7e1de8:~# rostopic pub -r 1000 /chatter std_msgs/String "data: Hello ROS!" &> /dev/null

root@4b5f2b7e1de8:~# rostopic hz /chatter
subscribed to [/chatter]
average rate: 1000.097
        min: 0.001s max: 0.002s std dev: 0.00009s window: 992
average rate: 1000.084
        min: 0.001s max: 0.002s std dev: 0.00009s window: 1994
average rate: 1000.082
        min: 0.000s max: 0.002s std dev: 0.00009s window: 2997
average rate: 1000.058
        min: 0.000s max: 0.002s std dev: 0.00009s window: 3999
average rate: 1000.037
        min: 0.000s max: 0.002s std dev: 0.00009s window: 5002

root        1171    1048 14 00:18 pts/4    00:00:11 /usr/bin/python3 /opt/ros/noetic/bin/rostopic pub -r 1000 /chatter
root        1259     676 41 00:19 pts/3    00:00:04 /usr/bin/python3 /opt/ros/noetic/bin/rostopic echo /chatter

top - 00:20:00 up 9 days,  5:07,  0 users,  load average: 0.90, 0.49, 0.57
Tasks:  10 total,   1 running,   9 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.9 us,  2.2 sy,  0.0 ni, 92.3 id,  0.0 wa,  0.0 hi,  0.6 si,  0.0 st
MiB Mem :  31774.9 total,   1820.6 free,   7141.9 used,  22812.5 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  23994.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1259 root      20   0  508272  44512  14704 S  35.9   0.1   0:15.96 rostopic (rostopic pub)
   1171 root      20   0  360676  44240  14816 S  23.1   0.1   0:16.86 rostopic (rostopic echo)

Expected behavior

Actual behavior

I believe there are 2 problems we can see here,

Additional information

fujitatomoya commented 1 year ago

ros2 daemon process consumes much CPU resource more than 75% when ros2 topic echo is being issued.

The root cause is that ros2 topic echo keeps calling system.listMethods API to XMLRPC server.

https://github.com/ros2/ros2cli/blob/1a4766098f917819508ce8b06ca44bdd041f5fc3/ros2cli/ros2cli/node/strategy.py#L59

If NodeStrategy node with daemon running, it will call __getattr__ via any methods to check if it can be managed by daemon. This causes many calls to XMLRPC server(daemon), and consumes CPU resource to response those incoming request.

enabling LocalXMLRPCServer.logRequests and ServerProxy.verbose, we can see many incoming request from ros2 topic echo to ros2 daemon XMLRPC server. (tcpdump -i lo -s 0 -A 'tcp dst port 11511 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x504F5354' can be used to monitor all http POST to server port as well.)

I think it needs to call system.listMethods technically only once after connection is established, we can keep this method list in client side.

addressed by https://github.com/ros2/ros2cli/pull/796, which brings significant performance improvement for ros2 daemon, in this test case 75% to almost 0% CPU consumption.

fujitatomoya commented 1 year ago

ros2 topic echo w/o daemon consumes 20% more CPU resource than rostopic echo

@iuhilnehc-ynos @Barry-Xu-2018 any thoughts?

iuhilnehc-ynos commented 1 year ago

In the ROS Noetic section, from

root        1171    1048 14 00:18 pts/4    00:00:11 /usr/bin/python3 /opt/ros/noetic/bin/rostopic pub -r 1000 /chatter
root        1259     676 41 00:19 pts/3    00:00:04 /usr/bin/python3 /opt/ros/noetic/bin/rostopic echo /chatter

, I think the process ID of rostopic pub is 1171.

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  1259 root      20   0  508272  44512  14704 S  35.9   0.1   0:15.96 rostopic (rostopic pub)
  1171 root      20   0  360676  44240  14816 S  23.1   0.1   0:16.86 rostopic (rostopic echo)

should be updated with (Press 'c' after top command)

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  1259 root      20   0  508272  44512  14704 S  35.9   0.1   0:15.96 rostopic (rostopic echo)
  1171 root      20   0  360676  44240  14816 S  23.1   0.1   0:16.86 rostopic (rostopic pub)

so the test results about ros1(noetic) in CPU Consumption Result need to update as well.

ROS version Publisher(pub) Subscription(echo) Daemon
Noetic 35.9% 23.1% N.A
ROS version Publisher(pub) Subscription(echo) Daemon
Noetic 23.1% 35.9% N.A

Compared with ROS1, not only does the subscriber consume more CPU resources, but also the publisher.

fujitatomoya commented 1 year ago

@iuhilnehc-ynos good eye! just in case, i will be double-check the result on my local machine and update the note.

fujitatomoya commented 1 year ago
root@4b5f2b7e1de8:~# ps -ef | grep rostopic
root         167      60 15 18:50 pts/3    00:00:14 /usr/bin/python3 /opt/ros/noetic/bin/rostopic pub -r 1000 /chatter std_msgs/String data: Hello ROS!
root         255      53 37 18:51 pts/2    00:00:07 /usr/bin/python3 /opt/ros/noetic/bin/rostopic echo /chatter

top - 18:53:23 up 12 days, 23:40,  0 users,  load average: 0.56, 0.50, 0.38
Tasks:  10 total,   1 running,   9 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.6 us,  0.4 sy,  0.0 ni, 96.0 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  31774.9 total,    594.0 free,   7026.9 used,  24154.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  24075.6 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                    
    255 root      20   0  508272  44636  14764 S  36.7   0.1   0:35.31 rostopic                                                   
    167 root      20   0  360676  44116  14696 S  20.3   0.1   0:29.29 rostopic                
...
fujitatomoya commented 1 year ago

Compared with ROS1, not only does the subscriber consume more CPU resources, but also the publisher.

true, this has been reconfirmed. pub and echo can consume 10-15% CPU resource more than rostopic.

considering (de)serialization and QoS support with this high frequency, probably we can expect this additional consumption would be required for ROS 2? any thoughts?

Barry-Xu-2018 commented 1 year ago

The architecture of ROS1 is simpler than ROS2.
Using complex architecture (such as based on DDS) must introduce extra resource consumption.
Due to the difference in the architecture, it is difficult to give an accurate estimate of additional consumption for ROS2 comparing ROS1.
We have to analyze ROS2 process by performance tool to find performance bottlenecks for improving ROS2 performance.

fujitatomoya commented 1 year ago

ros2 daemon process consumes much CPU resource more than 75% when ros2 topic echo is being issued.

this is addressed by https://github.com/ros2/ros2cli/pull/796

Barry-Xu-2018 commented 1 year ago

There is a big difference of python implementation in ROS1 and ROS2.
For ROS1 python implementation, it doesn't bind to C/C++. It is completely implemented in python.
For ROS2, it uses pybind11 to bind C++ code and C code (rcl layer). When data flows in python and C/C++, copy actions may exist. Besides, pybind11 introduce long call path.

The below diagram is for rclcpp
image

The diagram for rclpy
image

Compare rcl_take, you will find rclpy has long call path. This increases CPU consumption.
image

So this leads to extra CPU consumption in ROS2.

fujitatomoya commented 1 year ago

I did recheck the CPU consumption using https://github.com/ros2/ros2/commit/07c27c934a6a1269fec2828f4c1a4ff40430d44d

ROS version Publisher(pub) Subscription(echo) Daemon
Rolling(with daemon) 38.3% 55.3% 0.0%(*)
Rolling 37.9% 53.2% N.A
Noetic 23.1% 35.9% N.A

(*)... ros2cli::NodeStragegy bug has been fixed, https://github.com/ros2/ros2cli/pull/796

root@tomoyafujita:~/ros2_ws/colcon_ws# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root         433       1  1 22:00 pts/3    00:00:01 /usr/bin/python3 -c from ros2cli.daemon.daemonize import main; main() --name ros2-daemon --ros-domain-id 0 --rmw-implementation rmw_fastrtps_cpp
root         460     149 38 22:00 pts/2    00:00:35 /usr/bin/python3 /root/ros2_ws/colcon_ws/install/ros2cli/bin/ros2 topic pub -r 1000 /chatter std_msgs/msg/String data: Hello ROS!
root         530     215 55 22:00 pts/3    00:00:39 /usr/bin/python3 /root/ros2_ws/colcon_ws/install/ros2cli/bin/ros2 topic echo /chatter
root         559     233  0 22:01 pts/4    00:00:00 ps -ef

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    530 root      20   0 2772988  67420  34884 R  55.3   0.2   1:30.34 ros2
    460 root      20   0 2771928  66364  33984 R  38.3   0.2   1:09.16 ros2
    433 root      20   0 2769404  63168  34056 S   0.0   0.2   0:01.44 python3
root@tomoyafujita:~/ros2_ws/colcon_ws# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root         577     149 39 22:05 pts/2    00:00:34 /usr/bin/python3 /root/ros2_ws/colcon_ws/install/ros2cli/bin/ros2 topic pub -r 1000 /chatter std_msgs/msg/String data: Hello ROS!
root         604     215 53 22:05 pts/3    00:00:42 /usr/bin/python3 /root/ros2_ws/colcon_ws/install/ros2cli/bin/ros2 topic echo --no-daemon /chatter

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    604 root      20   0 2771372  66804  34252 R  53.2   0.2   0:34.39 ros2
    577 root      20   0 2771268  66428  34092 R  37.9   0.2   0:29.18 ros2
fujitatomoya commented 1 year ago

We conclude this performance difference between ROS and ROS 2 is reasonable at this moment to have more features QoS, Interfaces such as RMW , pybind and so on. I will go ahead to close this issue.