ros2 / geometry2

A set of ROS packages for keeping track of coordinate transforms.
BSD 3-Clause "New" or "Revised" License
110 stars 193 forks source link

[cache_unittest] Add direct implementation testing on ordering, pruning #678

Closed EricCousineau-TRI closed 1 month ago

EricCousineau-TRI commented 1 month ago

Towards #676 This changes nothing pertaining to the implementation. It only adds testing. Some of it is implementation-specific, but towards the ends of performance checks.

From https://github.com/ros2/geometry2/issues/676#issuecomment-2102512107

  1. It needs to hold an arbitrary number of entries, because we don't know the rate at which we will get data.
  2. The items needs to be in sorted order by timestamp, from the newest entry to the oldest.
  3. New items need to be inserted into arbitrary positions, as timestamps may come in out-of-order.
  4. Duplicates shouldn't be allowed.
  5. Old items need to be pruned out.

To answer them:

  1. There appears to be no limit on size. Not sure how to test this in a bounded fashion.
  2. This was not tested. Now explicitly tested via ImplSortedDescendingUniqueEntries
  3. This was not tested. Now explicitly tested via ImplSortedDescendingUniqueEntries
  4. This was already tested, but is also explicitly tested via ImplSortedDescendingUniqueEntries
  5. This was not tested (I believe). Now explicitly tested via ImplSortedDescendingUniqueEntries
EricCousineau-TRI commented 1 month ago

Gotcha. Is there a way to very explicitly denote a header file as internal? (i.e., it should not be considered as any part of stable API, and thus can be changed at a whim?)

Perhaps placing it in include/tf2/impl/?

EricCousineau-TRI commented 1 month ago

Also, I can't tell what happened with CI; is this failure expected / unrelated to this PR? https://build.ros2.org/job/Rpr__geometry2__ubuntu_noble_amd64/66/console#console-section-9

13:03:08 Removing intermediate container 9a8e02c29f4e
13:03:08 failed to get digest sha256:61b7bdff7301f10eefa4a8cc831d7e4303af3338862f459c215a078c480284c6: open /var/lib/docker/image/overlay2/imagedb/content/sha256/61b7bdff7301f10eefa4a8cc831d7e4303af3338862f459c215a078c480284c6: no such file or directory
13:03:08 Build step 'Execute shell' marked build as failure
13:03:08 [CMakeGNU C Compiler (gcc)Clang-Tidy] Skipping execution of recorder since overall result is 'FAILURE'
clalancette commented 1 month ago

Also, I can't tell what happened with CI; is this failure expected / unrelated to this PR?

Unrelated, it happens every once in a while in the infrastructure. When you push next it should hopefully ( :crossed_fingers: ) work.

clalancette commented 1 month ago

Perhaps placing it in include/tf2/impl/?

We could do that. But because of our ABI guarantees, it won't matter much anyway; we wouldn't be able to change the embedded data structure anyway. We could do a PIMPL thing here, but that has its own downsides. At this point I'm just happy enough leaving it in the public headers.

EricCousineau-TRI commented 1 month ago

Gotcha, so changing the underlying class, even if using same exact content / ordering, might affect ABI? I ask because I'm very interested in preserving ABI, because I would like to ultimately backport the related tests and fixes to humble. Also, just a quick glance tells me that decoupling these specific implementation details from TimeCacheInterface itself may effectively duplicate much of the functionality, aside from interpolation, so not only is this ABI change but also API change.

This is a bit more than I'd prefer to bite off, at least for what could be a minimal, targeted performance improvement.

Suggested options: (a) If we are going to change the ABI, then perhaps it is better to just outright write a much more performant TimeCache. (\cc @calderpg-tri) (b) The simple path is to simply stay with this current approach. I would assume that there is a way to avoid having TestFriend from affecting ABI, no?

I prefer (b), because I would like the minimum-effort / quickest solution that is a strict improvement.

Can I ask your opinions?

clalancette commented 1 month ago

Can I ask your opinions?

In general, my opinion is that we should always do what is best on rolling, without worrying about ABI breaks, because that is the code we are going to have to support going forward. Then we can worry about ABI when we backport to the other distributions.

In this specific case, I just really dislike having test fixtures in the public headers. As a downstream consumer of the header, it has no meaning, and it is testing/implementation "leaking" through the public headers. Instead, given that what we are trying to get access to is the "full" linked list to run tests, we could:

  1. Make the TimeCache::storage_ a protected member, which would allow us to subclass it in the tests and get access to it that way.
  2. Add in a method TimeCache::getAllItems, which would return a copy of all items in the list.

Of the two, I kind of prefer making TimeCache::storage_ protected, as it is the least impact. But I don't feel strongly about it.

One thing to note is that we probably will not be able to backport a change from private -> protected into Humble. However, I think it is less important that we backport the tests there, and instead it will be more important that we just backport the eventual performance fixes (which shouldn't require an ABI break).

EricCousineau-TRI commented 1 month ago

Gotcha, those are all fair points, and thanks!

Make the TimeCache::storage_ a protected member, [...]

I like making storage protected API as a minimum impact change, but I don't think it's "best" for rolling in terms of downstream user impact. Someone could understandably subclass TimeCache and gain access to private data that is intended only for testing. To me, having something like proected: /* THIS IS FOR TESTING ONLY */ my_private_storage_ seems more confusing than having friend class TestFriend.

If the concern is the existence of namespace impl { class TestFriend; } confusing users, then I think I could simply eliminate that. I will push that just for demonstration.

Add in a method TimeCache::getAllItems [...]

I like this most in terms of "testable but straightforward API". I'm not sure if there's any use case beyond testing, but at least this would not be as much of a foot-gun as other approaches.

However, I think it is less important that we backport the tests there [...]

Hm... I don't love not having tests relevant to backported changes.

Overall, given that above points, my opinion is that TestFriend is the best in terms of having fully tested and consistent code across the relevant platforms, and IMO, is not that much of a drawback.

However, I will defer to you, and try out a second commit that exposes getAllItems().

Does this sound good?

EricCousineau-TRI commented 1 month ago

Trying out the options:

Tidying up naming / minimizing excess in public header: f5c22ac7

Using getAllItems(): ec65bf64 My spidey-sense are tingling on this one - this now seems like a bad idea because the ordering may be implementation-dependent, and it seems unwise to promise anything else.

Demoting getAllItems() to protected: cc7431d9 This seems slightly better, but still user-accessible given that TimeCache is not final.

clalancette commented 1 month ago

My spidey-sense are tingling on this one - this now seems like a bad idea because the ordering may be implementation-dependent, and it seems unwise to promise anything else.

Yes, it is 100% implementation dependent. But the implementation is in this package, so anything that changes the implementation will have to update the tests as well. That seems like a reasonable thing in my opinion.

Note that I also ended up pushing 39371e8 , which cleans up some things. In particular, it switches getAllItems to return a const-reference, since that is what it was doing anyway. This also allows us to move the implementation into the .cpp file.

EricCousineau-TRI commented 1 month ago

I think it's great to update the tests, and I think any of the methods (including friend class) can enable that.

My concern is more about backwards-compatibility with consumers external to this package. By adding this method in a fashion that is accessible, it is exposing this package to perhaps over-burdensome constraints. (this is where friend class, helps, because it's very clearly not going to be in any stability promise)

Is there guidance in ROS 2 dev documentation regarding how aggressive stability promises are (or aren't) for API?

For example, in Drake, we have https://drake.mit.edu/stable.html We specifically exclude things such as functions marked internal use only or experimental, even if it is accessible via public API.

clalancette commented 1 month ago

Is there guidance in ROS 2 dev documentation regarding how aggressive stability promises are (or aren't) for API?

Yes, we never break API during a stable release (or, at least, we have to have a severe bug to warrant it): http://docs.ros.org/en/rolling/The-ROS2-Project/Contributing/Developer-Guide.html#versioning . However, we do have an exception for things that are internal (though that is not documented).

I'm totally on board with marking getAllItems as internal, though the honest truth is that I don't see us ever changing this within a stable distribution.

EricCousineau-TRI commented 1 month ago

Sweet, thanks! I think getAllItems() could change due to implementation, but I see that fits within the "outside of a stable distribution" criteria.

For this PR, is there anything else necessary for it to land?

clalancette commented 1 month ago

For this PR, is there anything else necessary for it to land?

We just need to run CI, particularly on Windows. Here it is:

EricCousineau-TRI commented 1 month ago

Per https://github.com/ros2/geometry2/pull/680#issuecomment-2110482680, looks like there are downstream tests that are sensitive the specific ordering of insertion. Not sure if there is anything that should be done here, aside from perhaps updating the test to state "insertion ordering is actually important for testing in test_tf2/buffer_core_test/BufferCore_lookupTransform/ring_45_configuration, which seems like a mouthful.

clalancette commented 1 month ago

Windows (correctly) warned about loss of precision in one of the calls. I fixed that in cff49f6, here is CI again:

ahcorde commented 1 month ago

https://github.com/Mergifyio backport jazzy humble iron

mergify[bot] commented 1 month ago

backport jazzy humble iron

✅ Backports have been created

* [#687 [cache_unittest] Add direct implementation testing on ordering, pruning (backport #678)](https://github.com/ros2/geometry2/pull/687) has been created for branch `jazzy` * [#688 [cache_unittest] Add direct implementation testing on ordering, pruning (backport #678)](https://github.com/ros2/geometry2/pull/688) has been created for branch `humble` but encountered conflicts * [#689 [cache_unittest] Add direct implementation testing on ordering, pruning (backport #678)](https://github.com/ros2/geometry2/pull/689) has been created for branch `iron` but encountered conflicts