ros2clifecycle_set and ros2service_call_change_state in lifecycle management is too slow.

ZhenshengLee commented 2 years ago

Bug report

Required Info:

Operating System:
- ubuntu22.04
Installation type:
- binary
Version or commit hash:
- Humble
DDS implementation:
- rmw_fastrtps_cpp
Client library (if applicable):
- ros2cli

Steps to reproduce issue

the simple lifecycle node can be seen in https://github.com/micro-ROS/system_modes/tree/master/system_modes_examples

time ros2 lifecycle set /drive_base activate
Transitioning successful
real    0m2.030s
user    0m0.334s
sys     0m0.113s

time ros2 lifecycle set /drive_base deactivate
Transitioning successful
real    0m2.089s
user    0m0.419s
sys     0m0.083s

time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 3, label: activate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=3, label='activate'))
response:
lifecycle_msgs.srv.ChangeState_Response(success=True)
real    0m2.128s
user    0m0.379s
sys     0m0.082s

time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 4, label: deactivate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=4, label='deactivate'))

response:
lifecycle_msgs.srv.ChangeState_Response(success=True)

real    0m2.776s
user    0m0.343s
sys     0m0.096s

Additional information

repeated test of the same cmd can make it quicker, see the complete test log below

zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 3, label: activate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=3, label='activate'))

response:
lifecycle_msgs.srv.ChangeState_Response(success=True)

real    0m2.128s
user    0m0.379s
sys     0m0.082s
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 3, label: activate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=3, label='activate'))

response:
lifecycle_msgs.srv.ChangeState_Response(success=False)

real    0m1.231s
user    0m0.536s
sys     0m0.117s
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 4, label: deactivate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=4, label='deactivate'))

response:
lifecycle_msgs.srv.ChangeState_Response(success=True)

real    0m2.776s
user    0m0.343s
sys     0m0.096s
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 3, label: activate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=3, label='activate'))

response:
lifecycle_msgs.srv.ChangeState_Response(success=True)

real    0m1.017s
user    0m0.354s
sys     0m0.075s
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 4, label: deactivate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=4, label='deactivate'))

response:
lifecycle_msgs.srv.ChangeState_Response(success=True)

real    0m2.737s
user    0m0.560s
sys     0m0.113s
zs@zs-vm-2204:~/zs_ws/ga_ros$ time ros2 service call /drive_base/change_state lifecycle_msgs/ChangeState "{transition: {id: 3, label: activate}}"
waiting for service to become available...
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=3, label='activate'))

response:
lifecycle_msgs.srv.ChangeState_Response(success=True)

real    0m3.112s
user    0m0.370s
sys     0m0.076s
zs@zs-vm-2204:~/zs_ws/ga_ros$

Thanks.

ZhenshengLee commented 2 years ago

May be the right way to manage lifecycle of a node is to write cpp lifecycle manager client?

fujitatomoya commented 2 years ago

May be the right way to manage lifecycle of a node is to write cpp lifecycle manager client?

I am not sure if this is right for everyone, it depends on the use case and requirement.

I am using 1th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz, and lifecycle_talker(w/o this line) can respond approximately 1 secnod.

root@tomoyafujita:~/ros2_ws/colcon_ws# time ros2 lifecycle set /lc_talker activate
Transitioning successful

real    0m1.034s
user    0m0.592s
sys 0m0.976s
root@tomoyafujita:~/ros2_ws/colcon_ws# time ros2 lifecycle set /lc_talker deactivate
Transitioning successful

real    0m1.030s
user    0m0.547s
sys 0m0.997s
root@tomoyafujita:~/ros2_ws/colcon_ws# time ros2 lifecycle set /lc_talker activate
Transitioning successful

real    0m1.043s
user    0m0.583s
sys 0m0.916s
root@tomoyafujita:~/ros2_ws/colcon_ws# time ros2 lifecycle set /lc_talker deactivate
Transitioning successful

real    0m1.040s
user    0m0.588s
sys 0m0.971s

I do not really know the requirement or expectation in performance perspective for this issue, but 1 second is reasonable for me. And this will also depends on the network configuration and node graph.

the reason that ros2 lifecycle set takes time is that it does many things under the hood.

get all node name from node graph to make sure that specified node exists in the graph.
get available transitions for that node to make sure user request is appropriate.
call change_state service for that node, and wait for the response.

ZhenshengLee commented 2 years ago

@fujitatomoya thanks for your quick reply.

I do not really know the requirement or expectation in performance perspective for this issue, but 1 second is reasonable for me.

My use case is that a manager of state machine would start a thread executing a launch.py to configure, activate, and deactivate some lifecycle nodes(with executing process ros2 lifecycle set), and in my case, this thread takes around 7-10 sesonds to finish the management of 6 lifecycle nodes, if run into error, the launch thread even stucks and cannot return. see https://github.com/robosoft-ai/SMACC2/issues/330 if you like

And the launch thread often run into the error with node not found, see #582 and you've already known

the reason that ros2 lifecycle set takes time is that it does many things under the hood.

get all node name from node graph to make sure that specified node exists in the graph.

get available transitions for that node to make sure user request is appropriate.

call change_state service for that node, and wait for the response.

I see,

I think all cmds of ros2cli depend on the node graph, or at least ros2 lifecycle set, ros2 service call and other cmds taking operations on nodes, how about directly writing rclcpp service clients to call service in cpp? Do you think it will bypass the node list issue?
I think ros2 service call /node/change_state would be quicker than ros2 lifecycle set, but there is no evidence in my test log, see logs in the top comment.

ZhenshengLee commented 2 years ago

It has been tested that the rclcpp::client<lifecycle_msgs::srv::ChangeState> is faster enough to manage lifecycle of the node than ros2 lifecycle set and ros2 service call /node/change_state

There may be a system pitfall in the ros2cli and/or rclpy.

fujitatomoya commented 2 years ago

how about directly writing rclcpp service clients to call service in cpp?

This could be quicker for initialization, and i think sometimes we do this kind of init performance improvement like replacing scripts into compiled code. (e.g replace bash script into C compiled binary.) probably this is something makes your performance requirement.

but i am not inclined to take this path for mainline, since this will increases burden for code maintenance. I think current ros2cli is really flexible to insert the 3rd party command and maintain.

Do you think it will bypass the node list issue?

No i do not think so, it is not about how fast it is, but where it is implemented. Node graph comes from underlying rmw implementation, so this will not bypass the node list issue, i believe.

There may be a system pitfall in the ros2cli and/or rclpy.

it just takes some time to check all the arguments are correct, such as node, topic and service names.

CC: @clalancette @wjwwood @ivanpauno

ZhenshengLee commented 2 years ago

No i do not think so, it is not about how fast it is, but where it is implemented. Node graph comes from underlying rmw implementation, so this will not bypass the node list issue, i believe.

from https://github.com/ros2/ros2cli/issues/582#issuecomment-776539070

Since rclcpp doesn't depend on daemon(graph cache), if rclcpp service client will bypass the issue(evidence from my test in https://github.com/ros2/ros2cli/issues/779#issuecomment-1315117834), the root cause would be in the daemon-graph-cache rather than rmw-graph.

fujitatomoya commented 2 years ago

Since rclcpp doesn't depend on daemon(graph cache)

neither rclpy. ros2cli has NodeStrategy to rely on ros2 daemon to reduce the time to get the connectivity map, but this can be opt-out.

the root cause would be in the daemon-graph-cache rather than rmw-graph.

i guess that this could be, it would be worth to try.

ros2 / ros2cli