ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
563 stars 427 forks source link

Simulated clock subscriber is built for each node #2659

Open roncapat opened 1 month ago

roncapat commented 1 month ago

While profiling a big application of mine, made by static composition of 20+ nodes in the same executable, enabling use_sim_time makes 20+ threads spawn, each one with a clock subscriber.

https://github.com/ros2/rclcpp/blob/9b654942f99f17850e0e95480958abdbb508bc00/rclcpp/src/rclcpp/time_source.cpp#L396-L402

Why wouldn't be possible to share among all nodes the clock subscription / the time source? This would bring down the count of clock subscriber threads to just one.

fujitatomoya commented 4 weeks ago

Currently TimeSource is managed under Node object, so that clock subscription entity can be created to support simulation time for each Node object. (some nodes use simulation time, some do not.) besides, different QoS for the clock subscription can be set based on node's requirement. Maybe using composition would be different story, but having TimeSource shared between Node objects would generate the coupling issue, that means if the node sharing TimeSource goes offline, all of the other nodes losing clock time source. to be honest, i am not sure if this is good design.

btw, we can disable the clock thread via NodeOption.

https://github.com/ros2/rclcpp/blob/9b654942f99f17850e0e95480958abdbb508bc00/rclcpp/include/rclcpp/node_options.hpp#L303-L309

roncapat commented 4 weeks ago

@fujitatomoya thank you for your feedback.

I have a couple more questions though...

About the possibility of having a mixed configuration, with some nodes using sim time and others not, I think it is totally doable that the ones with sim time enabled share the clock sub if they are in the same process space, while the others that take system time don't.

fujitatomoya commented 4 weeks ago

If a node "fails", in a composed application, wouldn't the whole application fail too?

you mean more like crashed by fails? if so, yes. composed nodes are in the same process space, if the process goes down because of one of the node crashes or coredumps, everyone in the same process space will be gone.

Wasn't instead the Intra-Process manager shared across all nodes?

I do not think IntraProcessManager object is shared between nodes, but data communication passes the the virtual address in the process space. so memory is shared within the same process map. (this is still maintained under experimental namespace.)

Finally, if I disable the clock thread, depending on the executor I choose for the node, may the performance be impacted?

this is good question. and yes. if clock thread is disabled, you need to spin this node with executor to update the simulation clock data.

I think it is totally doable that the ones with sim time enabled share the clock sub if they are in the same process space, while the others that take system time don't.

I am not against this, we always can explore the possibility for better user experience. but i am just not sure enough how we can design this in detail. for example, there could be multiple contexts with different domain, in that case we just cannot bind the process global time source to every nodes in the process. and maybe i am missing more here...

clalancette commented 4 weeks ago

I do not think IntraProcessManager object is shared between nodes,

It is. There is one IntraProcessManager per Context, which is how we end up being able to manage all of this between Nodes (you cannot do intra-process between contexts, which also implies you cannot do intra-process between different domain IDs).

I am not against this, we always can explore the possibility for better user experience. but i am just not sure enough how we can design this in detail. for example, there could be multiple contexts with different domain, in that case we just cannot bind the process global time source to every nodes in the process. and maybe i am missing more here...

I don't think we should try to design this sharing around multiple contexts; that is too hard. We could consider having all Nodes within a single Context share a clock thread, i.e. the clock thread would live in the Context (much like the IntraProcessManager). Then the Nodes in that Context could pull the time out of the Context. However, the problem with that design is that the Context has no notion whatsoever about subscriptions, so something would need to "feed" that Clock thread with data. That could be one of the Nodes, but which one? And what if that one goes away? (there may be other problems with that design as well)

mjcarroll commented 3 weeks ago

We discussed this in the triage meeting yesterday. The suggested solution would be to add a way to pass a TimeSource via the NodeOptions to allow users to override the internal timesource on a per-node basis.

For our generic composition containers, we could add a global flag such that all components either share or don't share a single time source.

For more complicated layouts (some nodes using sime, but not all), then these could be manage via manual composition like you are currently doing.

Would this be a feature that you are interested in adding for the rolling/kilted release?

roncapat commented 2 weeks ago

The manual override solution via NodeParameters is something I would like very much.

I further propose to split implementation in two subsequent steps/smaller PRs:

In this moment, I do not know if I may have available bandwidth to implement this; maybe I can start at least to prototype the first step and share with you the draft PR for further discussion if you are ok with this.