Open roncapat opened 1 month ago
Currently TimeSource
is managed under Node
object, so that clock subscription entity can be created to support simulation time for each Node
object. (some nodes use simulation time, some do not.) besides, different QoS for the clock subscription can be set based on node's requirement. Maybe using composition would be different story, but having TimeSource
shared between Node
objects would generate the coupling issue, that means if the node sharing TimeSource
goes offline, all of the other nodes losing clock time source. to be honest, i am not sure if this is good design.
btw, we can disable the clock thread via NodeOption
.
@fujitatomoya thank you for your feedback.
I have a couple more questions though...
About the possibility of having a mixed configuration, with some nodes using sim time and others not, I think it is totally doable that the ones with sim time enabled share the clock sub if they are in the same process space, while the others that take system time don't.
If a node "fails", in a composed application, wouldn't the whole application fail too?
you mean more like crashed by fails
? if so, yes. composed nodes are in the same process space, if the process goes down because of one of the node crashes or coredumps, everyone in the same process space will be gone.
Wasn't instead the Intra-Process manager shared across all nodes?
I do not think IntraProcessManager
object is shared between nodes, but data communication passes the the virtual address in the process space. so memory is shared within the same process map. (this is still maintained under experimental
namespace.)
Finally, if I disable the clock thread, depending on the executor I choose for the node, may the performance be impacted?
this is good question. and yes. if clock thread is disabled, you need to spin this node with executor to update the simulation clock data.
I think it is totally doable that the ones with sim time enabled share the clock sub if they are in the same process space, while the others that take system time don't.
I am not against this, we always can explore the possibility for better user experience. but i am just not sure enough how we can design this in detail. for example, there could be multiple contexts with different domain, in that case we just cannot bind the process global time source to every nodes in the process. and maybe i am missing more here...
I do not think
IntraProcessManager
object is shared between nodes,
It is. There is one IntraProcessManager
per Context
, which is how we end up being able to manage all of this between Nodes
(you cannot do intra-process between contexts, which also implies you cannot do intra-process between different domain IDs).
I am not against this, we always can explore the possibility for better user experience. but i am just not sure enough how we can design this in detail. for example, there could be multiple contexts with different domain, in that case we just cannot bind the process global time source to every nodes in the process. and maybe i am missing more here...
I don't think we should try to design this sharing around multiple contexts; that is too hard. We could consider having all Nodes within a single Context share a clock thread, i.e. the clock thread would live in the Context (much like the IntraProcessManager
). Then the Nodes in that Context could pull the time out of the Context. However, the problem with that design is that the Context has no notion whatsoever about subscriptions, so something would need to "feed" that Clock thread with data. That could be one of the Nodes, but which one? And what if that one goes away? (there may be other problems with that design as well)
We discussed this in the triage meeting yesterday. The suggested solution would be to add a way to pass a TimeSource
via the NodeOptions
to allow users to override the internal timesource on a per-node basis.
For our generic composition containers, we could add a global flag such that all components either share or don't share a single time source.
For more complicated layouts (some nodes using sime, but not all), then these could be manage via manual composition like you are currently doing.
Would this be a feature that you are interested in adding for the rolling
/kilted
release?
The manual override solution via NodeParameters
is something I would like very much.
I further propose to split implementation in two subsequent steps/smaller PRs:
TimeSource
via NodeParameters
In this moment, I do not know if I may have available bandwidth to implement this; maybe I can start at least to prototype the first step and share with you the draft PR for further discussion if you are ok with this.
While profiling a big application of mine, made by static composition of 20+ nodes in the same executable, enabling
use_sim_time
makes 20+ threads spawn, each one with a clock subscriber.https://github.com/ros2/rclcpp/blob/9b654942f99f17850e0e95480958abdbb508bc00/rclcpp/src/rclcpp/time_source.cpp#L396-L402
Why wouldn't be possible to share among all nodes the clock subscription / the time source? This would bring down the count of clock subscriber threads to just one.