ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
514 stars 410 forks source link

fix the execution order of entites in executors to favor insertion order #2537

Open wjwwood opened 1 month ago

wjwwood commented 1 month ago

In specific circumstances, multiple entities (of the same kind) in an executor can be ready at the same time and the executor must arbitrarily decide which to execute first, and in Iron and before, this fell back to the order in which they were added, but in Jazzy we (inadvertently) changed this and it became related to the hash order, subtly changing the order of execution that some folks had come to rely on, see: https://github.com/ros2/rclcpp/issues/2532

While this ordering was not guaranteed, this pr aims to restore the previous behavior for consistency and to provide a less surprising result, but the order is still not a guarantee nor is it the same for all executor types (e.g. the EventsExecutor is consistent, but is instead based on the trigger order of the entities, not their insertion/addition order).

The new test is also in a separate pr here: https://github.com/ros2/rclcpp/pull/2536 so it can be more easily back ported to previous versions. I think it should pass on them, but I still need to test it. See that pr for updates.

This approach will not work for backporting as it breaks ABI, but I believe I could make a version that doesn't break ABI (it will be as ugly as past non-ABI breaking changes). I just want to make sure we want to follow through with this before committing resources to that attempt.

This pr does incur some additional overhead, but I believe it should be relatively low, considering what we're storing. It does, however, slow down insertion and erasure from the collections which could be a concern in the tight loop of spin. I will do some more experimentation related to this, but the memory overhead of storing the entity pointers of the collection twice (once as an unordered map for fast random access and once in a vector for maintaining insertion order) should be relatively small. It does increase the memory footprint by 2x, but the footprint was small to begin with (three pointers per entity).

I looked at using existing "ordered maps" but all of them (that I could see) either provided the desired API but lost the fast lookup of the hash or did what I did here and maintained the information twice in two different data structures.

So I think this is a reasonable approach, but I'm up for discussion on alternatives.

thomasmoore-torc commented 1 month ago

As I've been looking into how all of this works, it seems to me that the bulk of the complexity associated with the ExecutorEntitiesCollection is to allow the Executor class to handle callback groups. So far as I can tell, the ExecutorEntitiesCollection essentially serves two purposes:

  1. To act as a staging area for entities to be added to the waitset
  2. To associate a callback group with an entity

As the SingleThreadedExecutor only needs the first purpose and the StaticSingleThreadedExecutor needs neither, would it be worth considering the "don't pay for what you don't use" principal and refactoring some of the purpose-specific functionality out of the base Executor class and either directly into the executor implementations that require that functionality or into intermediate implementations that can be used by multiple executors as needed?

wjwwood commented 1 month ago

As the SingleThreadedExecutor only needs the first purpose and the StaticSingleThreadedExecutor needs neither

I don't think that's actually true. The SingleThreadedExecutor does actually consider the callback groups, but in a very limited fashion. There's actually a test that disables individual callback groups explicitly, and checks that executors do not execute from them. Now whether or not that's useful isn't clear to me, but as it stands CallbackGroups are a non-optional part of the executor related APIs, and for better worse we cannot just ignore them. Now we could change that, but it would require changing some of the semantics, but that's not something we can do in a stable release, in my opinion. But we can consider it for the future.

From a technical angle, there's nothing that requires that any executor, including the two you mentioned, use the Entity Collection classes, it was just convenient. However, in my experience you cannot remove much if anything that's there and still handle ownership mechanics correctly, e.g. nodes hold pointers to callback groups which holds pointers to entities and you have to handle gracefully removing them when the user lets any go out of scope, but you have to hold ownership in the executor while using them, etc... So feel free to try and optimize them, but just be aware that there be dragons, lol.

Also, despite the name, the StaticSingleThreadedExecutor does actually need to do both 1 and 2 that you listed. And in fact after the recent refactoring there's little to no practical difference between it and the SingleThreadedExecutor, we just didn't manage to get it deprecated in the chaos of getting everything else done before the API freeze this time. (someone disagree with me if you think otherwise)

alsora commented 1 month ago

As the SingleThreadedExecutor only needs the first purpose and the StaticSingleThreadedExecutor needs neither

As discussed in the client library WG, an important feature (that is currently achieved through callback groups) is to allow to add only some of the node's entities to an executor

This pr does incur some additional overhead, but I believe it should be relatively low, considering what we're storing. It does, however, slow down insertion and erasure from the collections which could be a concern in the tight loop of spin

We usually decided to not focus too much on the overhead of insertion/deletion (compared to the overhead of the fast-loop: selection and execution of entities). Most of the ROS 2 systems will setup their nodes and then just run with a static graph,

wjwwood commented 1 month ago

We usually decided to not focus too much on the overhead of insertion/deletion (compared to the overhead of the fast-loop: selection and execution of entities). Most of the ROS 2 systems will setup their nodes and then just run with a static graph,

I'm not sure if we only ever insert/delete when the user adds or removes something, e.g. copying or updating a collection can happen even when the user hasn't added or removed anything, but maybe I'm wrong about that. We have two collections in the executor base class which are swapped (sort of) at specific intervals:

https://github.com/ros2/rclcpp/blob/3bc364a10bd1bfe710307147174b37ef62562abd/rclcpp/src/rclcpp/executor.cpp#L683-L717

Anyway, maybe @mjcarroll can expand on that.

This is still something I'm going to investigate before pushing to merge this.

alsora commented 1 month ago

Well, rebuilding the collection only when something changes was the purpose of the "notifier waitable" and the "entities collection" classes, and that's how they were used in the events executor and static single threaded executor. Unless we did something wrong while expanding them to the single threaded executor, i would expect the same behavior.

wjwwood commented 2 weeks ago

Well, rebuilding the collection only when something changes was the purpose of the "notifier waitable" and the "entities collection" classes, and that's how they were used in the events executor and static single threaded executor. Unless we did something wrong while expanding them to the single threaded executor, i would expect the same behavior.

That was my hope, but I wanted to confirm which I am going to do soon hopefully.

mjcarroll commented 2 weeks ago

Unless we did something wrong while expanding them to the single threaded executor, i would expect the same behavior.

That is the behavior unless something was done incorrectly.

If the set of the entities in the collection stays constant, the waitset should not be rebuilt.

wjwwood commented 2 weeks ago

I closed https://github.com/ros2/rclcpp/pull/2536 in favor of this pr (it was just the test, with no fix), but cross-posting here for visibility since there were some good discussions on that pr too.

alsora commented 1 week ago

@wjwwood did you have a chance to check if the collection was correctly functioning? i.e. it's rebuilt only when something changes. P.S. this is something we should be able to easy unit-test.

wjwwood commented 1 week ago

I can confirm that the insertions only occur when entities are created or destroyed, so the overhead of this fix should be minimal.

I just added print statements during insertion as well as some to show what steps of the test we're in:

FINDME >> start executor test, before executor creation
FINDME >> before adding callback group
insert(0xac797083e690) insert(0xac7970886c00) insert(0xac79708aeb30) insert(0xac79708c2780) insert(0xac79708c1970) insert(0xac79708d45d0) insert(0xac79708fcbe0) insert(0xac79708fbd30) insert(0xac79708fc410) insert(0xac7970934f10) insert(0xac7970947cf0) insert(0xac797095e870) insert(0xac7970972220) insert(0xac79709706a0) insert(0xac7970997dd0) insert(0xac7970996ca0) insert(0xac79709a9c00) insert(0xac79709bcc50) insert(0xac79709e3910) insert(0xac79709f6800) insert(0xac797089b4a0) insert(0xac797089b300) insert(0xac79708ae480) insert(0xac7970629a60) insert(0xac79708c2050) insert(0xac7970629820) insert(0xac79708d4cc0) insert(0xac79706295e0) insert(0xac79708e7840) insert(0xac7970629310) insert(0xac79708fc570) insert(0xac79708fc440) insert(0xac797090f270) insert(0xac797090f140) insert(0xac7970921d30) insert(0xac79708fc220) insert(0xac7970934770) insert(0xac7970934640) insert(0xac7970947660) insert(0xac7970628770) insert(0xac797095e280) insert(0xac79706284a0) insert(0xac7970971c30) insert(0xac79706267e0) insert(0xac79709847d0) insert(0xac79706265e0) insert(0xac7970997740) insert(0xac79708adb90) insert(0xac79709aa780) insert(0xac79709aa5c0) insert(0xac79709bd650) insert(0xac79709bd490) insert(0xac79709d05b0) insert(0xac79709d03f0) insert(0xac79709e3280) insert(0xac79709e30c0) insert(0xac79709f6170) insert(0xac79709f5fb0) insert(0xac7970a09170) insert(0xac7970a08fb0) insert(0xac79707ff8a0) insert(0xac797083e690) insert(0xac7970886c00) insert(0xac79708aeb30) insert(0xac79708c2780) insert(0xac79708c1970) insert(0xac79708d45d0) insert(0xac79708fcbe0) insert(0xac79708fbd30) insert(0xac79708fc410) insert(0xac7970934f10) insert(0xac7970947cf0) insert(0xac797095e870) insert(0xac7970972220) insert(0xac79709706a0) insert(0xac7970997dd0) insert(0xac7970996ca0) insert(0xac79709a9c00) insert(0xac79709bcc50) insert(0xac79709e3910) insert(0xac79709f6800) insert(0xac797089b4a0) insert(0xac797089b300) insert(0xac79708ae480) insert(0xac7970629a60) insert(0xac79708c2050) insert(0xac7970629820) insert(0xac79708d4cc0) insert(0xac79706295e0) insert(0xac79708e7840) insert(0xac7970629310) insert(0xac79708fc570) insert(0xac79708fc440) insert(0xac797090f270) insert(0xac797090f140) insert(0xac7970921d30) insert(0xac79708fc220) insert(0xac7970934770) insert(0xac7970934640) insert(0xac7970947660) insert(0xac7970628770) insert(0xac797095e280) insert(0xac79706284a0) insert(0xac7970971c30) insert(0xac79706267e0) insert(0xac79709847d0) insert(0xac79706265e0) insert(0xac7970997740) insert(0xac79708adb90) insert(0xac79709aa780) insert(0xac79709aa5c0) insert(0xac79709bd650) insert(0xac79709bd490) insert(0xac79709d05b0) insert(0xac79709d03f0) insert(0xac79709e3280) insert(0xac79709e30c0) insert(0xac79709f6170) insert(0xac79709f5fb0) insert(0xac7970a09170) insert(0xac7970a08fb0) insert(0xac79707ff8a0) FINDME >> spinning to complete executor test: 0 / 20
FINDME >> spinning to complete executor test: 1 / 20
FINDME >> spinning to complete executor test: 2 / 20
FINDME >> spinning to complete executor test: 3 / 20
FINDME >> spinning to complete executor test: 4 / 20
FINDME >> spinning to complete executor test: 5 / 20
FINDME >> spinning to complete executor test: 6 / 20
FINDME >> spinning to complete executor test: 7 / 20
FINDME >> spinning to complete executor test: 8 / 20
FINDME >> spinning to complete executor test: 9 / 20
FINDME >> spinning to complete executor test: 10 / 20
FINDME >> spinning to complete executor test: 11 / 20
FINDME >> spinning to complete executor test: 12 / 20
FINDME >> spinning to complete executor test: 13 / 20
FINDME >> spinning to complete executor test: 14 / 20
FINDME >> spinning to complete executor test: 15 / 20
FINDME >> spinning to complete executor test: 16 / 20
FINDME >> spinning to complete executor test: 17 / 20
FINDME >> spinning to complete executor test: 18 / 20
FINDME >> spinning to complete executor test: 19 / 20
FINDME >> stop executor test

You can see that it does some insertions when the callback group is added, but not between spins, which was expected.

wjwwood commented 1 week ago

P.S. this is something we should be able to easy unit-test.

I tested it with print statements, and I started to add a unit test, but checking it as a unit test became pretty difficult, since it's not just testing a unit, but rather two working together (with different results for each pair), e.g. STE with the EntityCollection vs the MTE with the EntityCollection vs the EventsExecutor with the EntityCollection. I tried to add a test that just used some methods of the EntityCollection, but it's not set up for introspection, so I kind of ran out of time to make the changes needed to properly test this. Probably with a derived class only used in the test or perhaps some hooks that can be installed just for testing.

So unfortunately I'll have to leave that for others or future me to investigate.