Check for existence of Node in verbs can cause false failures when calling

ros2 / ros2cli

ROS 2 command line interface tools

Apache License 2.0

172 stars 161 forks source link

Check for existence of Node in verbs can cause false failures when calling #799

Closed destogl closed 1 year ago

destogl commented 1 year ago

Hi,

in ros2_control we were using CLI calls to load parameters and has issues, especially with FastDDS that calls were failing because the node could not be discovered on time. We have overcome this in the referenced PR, still it would be good to make CLI more robust. The issue is partially caused by a single check of node existence done in these lines:

https://github.com/ros2/ros2cli/blob/bcb1414ea87ba164ea62c918f23b2b761f4c5a4a/ros2param/ros2param/verb/load.py#L45-L50

This check depends on strongly how fast discovery of DDS implementation is working. There are two possible ways to make this more robust:

Remove this check because it is actually not needed – since the method called has already timeout if service cannot be found. Service should time out in both cases, if target node does not exist and also if service does not exist. https://github.com/ros2/ros2cli/blob/bcb1414ea87ba164ea62c918f23b2b761f4c5a4a/ros2param/ros2param/verb/load.py#L53
Add timeout to this check, repeating it multiple times until the timeout. This would make this check much more robust against slower DDS discovery capabilities.

This issue is repeated on multiple verbs in ros2param library, and it would make sense to adjust it everywhere.

Which of the proposed solution do you think is preferred?

fujitatomoya commented 1 year ago

@destogl thanks for posting issue.

using NodeStrategy, there is an assumption that ros 2 daemon can response quickly the node availability based on cached node graph if ros 2 daemon is running the that host machine. Do you have this problem when ros 2 daemon is running?

Which of the proposed solution do you think is preferred?

I would keep node check before service call, since node is not available and service is not available is not the same meaning for user. I am inclined to take an improvement 2 that makes sense. we can add an timeout option until it times out or node and service becomes available.

destogl commented 1 year ago

@destogl thanks for posting issue.

using NodeStrategy, there is an assumption that ros 2 daemon can response quickly the node availability based on cached node graph if ros 2 daemon is running the that host machine. Do you have this problem when ros 2 daemon is running?

yes, we could observe the issue with and without daemon running. The issue that we were starting plenty of nodes at once. I know that our approach was not well-designed, so we changed it because of that. The idea here is to increase robustness.

Which of the proposed solution do you think is preferred?

I would keep node check before service call, since node is not available and service is not available is not the same meaning for user. I am inclined to take an improvement 2 that makes sense. we can add an timeout option until it times out or node and service becomes available.

This sounds good! Thanks for the fast answer

fujitatomoya commented 1 year ago

I see, that sounds really discovery latency in rmw implementation. As an information, how many nodes are you guys trying to instantiate at the same time? and those nodes are connected via network like wireless network?

destogl commented 1 year ago

It is about 20 nodes, everything on one computer connected with cable. The nodes are literary running inside the same machine or container.

fujitatomoya commented 1 year ago

@destogl it would be nice if you could try https://github.com/ros2/ros2cli/pull/802 !

fujitatomoya commented 1 year ago

addressed by https://github.com/ros2/ros2cli/pull/802, closing.