Get rid of service client nodes

mjeronimo commented 5 years ago

There are several nodes in the system that are created and used solely for interfacing to services. These "client nodes" are used because spin_until_future_complete cannot accept a node that is already on an executor. Instead, we can use future.wait_until instead of spin_until_future_complete. This will allow us to remove several of these client nodes in the system, which will hopefully ease discovery congestion.

mjeronimo commented 5 years ago

@orduno See #808 for some background about this issue and a quick code sample.

SteveMacenski commented 5 years ago

@mjeronimo is this still an issue?

crdelsey commented 5 years ago

It is still an issue. Having too many nodes impacts startup times and seems to overwhelm the current DDS implementations.

Possible fixes to this issue are:

Fix the DDS implementations to better handle many nodes.
Fix rcl to allow re-entrant spinning on the current node.
Fix rcl/rclcpp to allow continuations to be returned from service callbacks.
Re-architect our service calls to cleverly use callback groups and multi-threaded executors.
Re-architect nav2 to not wait for network activity in the context of a callback.

SteveMacenski commented 5 years ago

@crdelsey can you provide a few examples in the stack where we use "client nodes", other than costmap spinning, I had a hard time finding any

crdelsey commented 5 years ago

I think the big one is here in ServiceClient. https://github.com/ros-planning/navigation2/blob/7652518af52f257d097269309986943771d98683/nav2_util/include/nav2_util/service_client.hpp#L38

Then, there's another use in LifecycleManagerClient (https://github.com/ros-planning/navigation2/blob/a095038da9f4e3729679e5d20b39e1d95582fb16/nav2_lifecycle_manager/src/lifecycle_manager_client.cpp#L30).

There may be other places.

SteveMacenski commented 5 years ago

Got it, I didn't see that hidden in there. "Fix DDS" I feel like is a never ending battle, I'd say 2 or 3 seem good general long term fixes but 4 seems more practically in our control. I'll need to do more reading on these topics

SteveMacenski commented 4 years ago

Any objection to using multithreaded executors? It seems that way we can have these all going at once with one node. Do you think there are applications here sensitive enough (or otherwise not well formatted) for multithreaded processing? Most places have mutex where we might want to use a multithreaded spinner. I wouldn't think it would be too much of a hassle to update the few places required.

shivaang12 commented 4 years ago

@mjeronimo Do you have any updates on this? Let me know even if you have any unfinished work on this.

SteveMacenski commented 4 years ago

@mjeronimo Shivang is taking a look at this topic. Carl mentioned that you had a branch and started to work on it that either may be done or had some good progress. Can you give us a run down on where that branch is and what still needs to be completed to support?

mjeronimo commented 4 years ago

@shivaang12 @SteveMacenski Sorry, guys I haven't been on github recently (we're using gitlab internally for my current project). Anyways, @orduno did the most exploration on this topic. He prototyped at least one alternative approach that, I believe, used multi-threaded executors. I've also brought up the topic directly to folks at Open Robotics and they agree that this is an issue (but I don't think there are any plans to allow re-entrant spinning on the current node anytime soon). In general, I think it's a good idea to not proliferate nodes like this because of the discovery traffic introduced and the possibility of dropping messages.

SteveMacenski commented 4 years ago

this is an issue (but I don't think there are any plans to allow re-entrant spinning on the current node anytime soon

Can confirm on the TSC side, doesn't seem likely for Foxy (or G-turtle... bigger fish to fry). On my wish list of things for OR people to work on that effect navigation users, this isn't high at the moment.

Awesome, hopefully Carlos can chime in. Ignoring Foxy, I'd like 1 server to 1 node mapping in the next LTS distribution. Multithreaded executors seems to be the lowest-effort clearest way to handle this, though that does create issues with the components API.

The question then becomes: do we value using components or minimizing nodes more? Currently, I think that answer is nodes from how inefficient it is to have multiple. However, that's improving by the week, so I think long term that answer is components.

BriceRenaudeau commented 3 years ago

I am currently making a PR to use callback groups for service clients instead of nodes #2216.

On the Rolling distribution, there are callback groups that can be attach to a node and added to executors. This allows to use of spin_until_future_complete() while the node is still spinning. It makes nodes multithreaded.

It can be used to execute synchronous service/action call in timer/subscriber callback.

SteveMacenski commented 3 years ago

https://github.com/ros-planning/navigation2/pull/2216#issuecomment-793069766

I think this comment gives a good summary of what should be done for each case.

SteveMacenski commented 3 years ago

Unfortunately, from deeper looking, each new executor creates a new context (unless one given on construction) therefore , with the new design https://design.ros2.org/articles/Node_to_Participant_mapping.html each context now maps to a DDS participant so this is really no different than just having multiple client nodes.

open question then: if we gave the node's context to a new separate executor_ on construction, is that the same DDS participant and also can be processed in parallel?

I think this probably deserves a dedicated redesign session to think about the executor and threading options we have available and go through our server list to see what makes the most sense in order to reduce the number of DDS participants and nodes. We should also analyze what actions+services+subscribers exist in each server and add them to separate callback groups if necessary so spinning has a deterministic output to processing the "right" information in each context (not-ROS-context, software one). Right now everywhere are just automatically added to one. Especially for planner / controller / waypoint servers, it may make sense to separate these callback groups and all processing in parallel or spinning a particular one.

The right answer is probably a mix of multi-threaded executors and multiple single-threaded executors depending on the server's needs. Also need to determine the context question above whether we can share that across multiple executor objects to run in parallel but not add additional participants on the network.

SteveMacenski commented 3 years ago

I spoke with Chris from OR @clalancette and he gave me the key to my understanding on this -- while a new context is constructed on a new executor, when none is given as an input, it uses the default context which stores a static variable to use https://github.com/ros2/rclcpp/blob/master/rclcpp/src/rclcpp/contexts/default_context.cpp#L23. Therefore, any new executor, node, etc will use that same context regardless, unless you give a context to the object on construction.

Thusly, we can continue this work but the question remains how necessary this is. Essentially, this is a refactoring effort now. We're working on turning N nodes into 1 node with N threads as a multi-threaded executor: "whats better: N single threaded exeuctors or 1 multi-threaded executor with N threads"? There's not a clear answer to that, but there shouldn't be insane performance differentials now.

It would be good to do this to refactor and make the best use of ROS2, but it is no longer a critical requirement. I believe we now have sufficient information to make decisions and execute on this task when someone has time to work on it.

BriceRenaudeau commented 3 years ago

I see several nodes that can still be removed.

[ ] WaypointFollower client_node: easy, need to use callbackgroup instead
[ ] LifecycleManager bond_client_node_: easy if we don't do waitUntilFormed
[ ] Costmap2DROS client_node_: needed but not used by layers and filters
[ ] Costmap2DROS rclcpp_node_: Difficult, needed to pub/sub TF

gezp commented 3 years ago

I have searched all internal nodes which need be removed,and i will do this work in this summer. Any suggestions are welcome.

Remove rclcpp_node_ in LifecycleNode

[x] rclcpp_node_ in class AmclNode : https://github.com/ros-planning/navigation2/pull/2483
[x] rclcpp_node_ in class Costmap2DROS : https://github.com/ros-planning/navigation2/pull/2489
[x] rclcpp_node_ in class MapSaver : https://github.com/ros-planning/navigation2/pull/2454
[x] rclcpp_node_ in class PlannerServer : https://github.com/ros-planning/navigation2/pull/2459, https://github.com/ros-planning/navigation2/pull/2480
[x] rclcpp_node_ in class ControllerServer : https://github.com/ros-planning/navigation2/pull/2459, https://github.com/ros-planning/navigation2/pull/2479
[x] remove the bool use_rclcpp_node in the constructor of LifecycleNode, update all derived classes : https://github.com/ros-planning/navigation2/pull/2993

Other independent cases

[x] client_node_ in class WaypointFollower : https://github.com/ros-planning/navigation2/pull/2441
[x] clinet_node_ in class Costmap2DROS : https://github.com/ros-planning/navigation2/pull/2489
[x] node_ in class LifecycleManagerClient : https://github.com/ros-planning/navigation2/pull/2469
[x] bond_client_node_ in class LifecycleManager: https://github.com/ros-planning/navigation2/pull/2456

Internal nodes that need remain :

client_node_ in class BtActionServer

client_node_ in class Nav2Panel

SteveMacenski commented 2 years ago

@gezp I saw another of your PRs was merged, is this now good to move forward on the AMCL / costmap PRs?

gezp commented 2 years ago

@SteveMacenski the upstream PR https://github.com/ros2/geometry2/pull/447 is still blocked.

SteveMacenski commented 2 years ago

Final PR imminent to merge

ros-navigation / navigation2

Get rid of service client nodes #816