locusrobotics / fuse

The fuse stack provides a general architecture for performing sensor fusion live on a robot. Some possible applications include state estimation, localization, mapping, and calibration.
Other
700 stars 115 forks source link

ROS2 port in progress - was: use of ros::time vs std::chrono #257

Closed BrettRD closed 1 year ago

BrettRD commented 2 years ago

I'm working on a ros2 port, and I'm seeing a lot of files that only depend on ROS for its time class.

I really appreciate how Fuse is built as a standalone algorithm with clearly defined bindings for ROS, and I think this can go a little deeper.

I have a multi-agent application where no system has access to an accurate time, so I need to use the optimiser to establish relationships between clocks.

Is a switch from ros::time to std::chrono something that you would like back-ported?

svwilliams commented 2 years ago

I'd consider it. I'd just want to make sure I understand the ramifications for API compatibility. If you can point me at a branch with the time changes in it (either ROS1 or ROS2), I'll take a look.

BrettRD commented 2 years ago

I'm still figuring out the ramifications, I think the biggest deviation will be the way timestamps are packed for uuid generation. (I think boost::serialise has builtins for std::chrono)

I'll be posting here when it happens: https://github.com/BrettRD/fuse/tree/galactic

I don't think my multiple-clocks thing will appear in the core api at all; in the extreme case I might make a plugin to override TimstampManager, but I doubt I'll need precise history windowing.

SteveMacenski commented 2 years ago

@BrettRD how far are you on a ROS 2 port? I'm starting to make plans for 2022 and wondering if I could assume this was going to be available in ROS 2 cira-March timeframe? We're looking for a robot_localization replacement in the Nav2 stack since Tom nor I have had a great deal of time to maintain it. It's better to use something newer and that the core maintaining company actually uses in production to keep up with it.

I think I could also dedicate some resources to helping with the ROS 2 migration.

@svwilliams a concern I have is around maintenance. Do you have a plan (or could create a plan) in order to keep ROS 1and ROS 2 in sync (e.g. requirements that PRs have both counterparts). It would be important to use in the Nav2 community that these don't get critically out of sync (for instance like grid_maps is, which has prevented us from replacing costmap_2d for the immediate future).

BrettRD commented 2 years ago

@SteveMacenski I have fuse_msgs and most of fuse_core building under galactic, barring parameter server stuff and whatever pub/sub operations I could isolate. fuse_constraints, fuse_variables , fuse_graphs, fuse_loss need only trivial changes fuse_models, fuse_optimizers, and fuse_publishers need regular node translation and parameter handling changes.

The major sticking point (as @svwilliams mentioned elsewhere) is the route used to shim a callback into the ROS spin thread. It's used in three main places, and rclcpp doesn't provide anything like it (yet, planned since 2018). There are a number of ways of dealing with it, but precious few of them seem sane, safe, or maintainable.

I'm doing this as a hobby at the moment, I'd love to be able to focus on it.

svwilliams commented 2 years ago

The AsyncXXX base classes were intended to make it easier for external developers to create sensors and motion models. The design did two things:

  1. It hide the threading details so a new developer didn't need to know the minutia of what threads in the optimizer were calling which methods.
  2. It makes a sensor or motion model behave similar to a standard node or nodelet.

However, these classes are not strictly necessary. One option is to remove them entirely and construct all of the motion models, sensor models, and publishers on top of the non-async base classes instead (fuse_core::MotionModel, fuse_core::SensorModel, and fuse_core::Publisher). It will just require more care and attention to precise which thread is doing what.

Another possibility (which I'm not sure I like) is to build the async base classes using a different event loop. For example, Boost ASIO allows you to easily create a thread pool of N threads, and post arbitrary function calls to be executed. Assuming ROS2 has some equivalent of ros::spin_once(), the ROS callback queue could be serviced from within the ASIO thread pool as well. This would achieve a very similar result -- all ROS callbacks and derived Fuse methods would execute within a single thread.

I'm completely open to other ideas. Those are just my first thoughts.

BrettRD commented 2 years ago

Just last night I found this: https://answers.ros.org/question/322815/ros2-how-to-create-custom-waitable/ which should allow me to make something functionally equivalent to callback_wrapper.h and get on with the port.

I do like the async publisher concept, I've turned them off in the build so I could focus on the nuts and bolts, but I intend to get them going again when I get up to converting node handles etc I think ditching the callback queue interaction would require a different (or additional) event loop almost everywhere. rclpp does seem to be committed to allowing custom callbacks to be inserted into CallbackGroups like ros.h does, it's just not on the priority list.

rclcpp has some great features for multi-threading within a node, but leaning on those might force a divergence between code bases. I'm all in favour of the compact node-like threading model you have already, and extending that with the ros components system requires that it supports a plain spin().

ayrton04 commented 2 years ago

I started down this path independently (and also found that Waitables could solve the callback queue manipulation issue), but I don't want to duplicate effort. What's the status of your effort, @BrettRD?

BrettRD commented 2 years ago

@ayrton04 it's good to see someone else on the effort

I have it building up to and including fuse constraints and fuse variables, I'm most of the way through fuse graph. I might draw a dependency diagram for the docs

I'm looking forward to reading your take on waitables, I think my callback wrapper is ok, but I've only seen one other example online

I'll have a look at your commits today and see what I can merge, a brief glance seems like we're focusing on different things

BrettRD commented 2 years ago

There are a few differences of approach, and you've found some APIs that I missed

I need to pick up your ros_console patch.

I made node-like classes such as the optimizers inherit Node directly, this makes ROS2 threading easier to configure, and might allow interaction with composable_nodes later on I haven't done much to the node-like classes from fuse_core, they seemed like they could wait. I should be able to merge yours in pretty cleanly.

I approached ceres parameters from the optimiser side where it was difficult to pass the Node to the param helpers, so I'm passing the NodeParametersInterface and collecting the params on that I had a go at the structure of some of the parameter files, the forward declarations and per-node name-spacing did require a few changes to how the parameters would be arranged. I was planning on adding an on-param-change callback when I ported the parameter helpers, but I think I'll merge yours and back-log dynamic parameters for now

Your CallbackWrapper seems to be blocking by default, this is likely to slow down the sensor_models passing measurements to the optimiser, you might like to pick up mine. I've added a CallbackAdapter class to implement Waitable and host a queue of CallbackWrappers more like the ROS1 callback_queue does. non-blocking example CallbackAdapter can be initialised in a parent class constructor such as optimizer

I've also tried to reduce the total ROS dependencies across Fuse; A lot of Fuse seems like it could be built to run ROS1 and ROS2 sensor_models at the same time.

I made a lot of my changes while still trying to learn how Fuse hangs together, so a lot of it reads like scattered proof-of-concept rather than planned systematic changes. I know what I'm doing now, and I'm expecting to have a lot more time to commit to this starting next month

ayrton04 commented 2 years ago

@BrettRD would it make sense to:

And then we can all work together?

BrettRD commented 2 years ago

I think those things make sense,

I prefer the branch name to refer to a particular ROS2 version until we can get osrf matrix build tests running I have at least one project where I maintain an upstream commit for a back-port to Dashing

I'll keep my fork open until I can clear out my issue tracker

ayrton04 commented 2 years ago

OK, I've been working off of rolling. We can either finish the migration against that and then see how far back the API is supported and make branches accordingly, or we can target a specific release for our migration and work from that. If we go that route, though, I'd suggest something like Galactic.

@svwilliams, thoughts?

svwilliams commented 2 years ago

It sounds like targeting rolling is the suggested method:

Since new stable distributions will be created from snapshots of the Rolling distribution, package maintainers who want to make their packages available in future ROS 2 distributions can do so by releasing their packages into the Rolling distribution.

But I'm a little concerned that the constant churn of the rolling release would get in the way of doing the port. I'm inclined to target the port for galactic or humble (allegedly being released today), and then transition that code to the rolling release.

ayrton04 commented 2 years ago

Seconded.

BrettRD commented 2 years ago

My work so far has been aimed at Galactic, and I think I've used APIs that support Rolling through Foxy but not Dashing. I'm keen to run automated tests against Rolling through Dashing once the port is complete.

ayrton04 commented 2 years ago

OK, so then let's start by focusing on galactic. We'll get the port wrapped up and then move on to supporting other versions.

ayrton04 commented 2 years ago

Invite sent, and galactic-devel branch created. @svwilliams, how would you like to proceed? PR one commit at a time from @BrettRD's work? Have him PR everything he has into that branch?

svwilliams commented 2 years ago

I'm not too picky; whatever works best for Brett. But Brett should probably drive the PR creation. He knows the history of his changes, so he'll know what changes should get moved over.

ayrton04 commented 2 years ago

@BrettRD, whenever you have time, feel free to start PR-ing commits. No rush.

BrettRD commented 2 years ago

I'm going to open four PRs, This should break my work up enough to discuss the changes

Just so that I don't have to re-base these each time, Can someone set the galactic-devel branch to point to 82c284e0fd3d54b0d39ee4e34a2416771da5a098?

BrettRD commented 2 years ago

I've opened a PR with the ROS2 port into galactic-devel, I think we ought to open new issues to discuss the feature changes

ayrton04 commented 2 years ago

Sorry, I haven't forgotten about this. A bit slammed this week, but will get moving again shortly.

ayrton04 commented 2 years ago

Sorry for the huge delay here, @BrettRD. For full transparency, Locus has decided to work directly with Open Robotics on this port. I think using your work will still make sense as a starting point, however, and we'd welcome your continued involvement.

The kick-off is nominally set for September, so your PR may remain in open state until then. Thanks for your work (and patience!).

BrettRD commented 2 years ago

That's very exciting! I'd love to remain involved when Open Robotics rolls in

I'll keep chipping at the PR anyway, I'm nearing a point where I can start running tests on the packages

My motivation when I started the port was to completely learn the approach, and then think about what's missing for my applications, I'd be delighted to share my findings

methylDragon commented 1 year ago

Greetings! This is Open Robotics rolling in :eyes: :wave:

I'll be working on the port targeting rolling, but basing my work partly on the incredible work done so far. I'll try to cherry-pick as much as possible to preserve credit.

To this end I'll be making rolling branches on this repo and on tf2_2d and sending PRs against it, and I plan to be writing a new issue to track the progress of the port. I plan to be working up the dependency tree bit by bit (so, msgs -> core -> loss + variables + graphs -> constraints, and so on...), though I expect to be doubling back on some stuff like core if needed.

I'd like to ask if there's a preference for which commit to base the porting branch off of? If there's no preference I'll probably be branching off the same commit that galactic-devel was branched off of.


In the meantime, I've been spending some time getting familiarized with the code base and looking at the current progress from both Tom and Brett's branches.

I think one issue I can think of currently, is the issue of time. I see Brett's proposal for the TimeStamp class and think it's a nice way to support both ROS time constructs, as well as std::chrono time, but there currently isn't any consideration for supporting the ROS clock yet (RCL_ROS_TIME), which will be important for supporting sim-time (as dictated by the /clock topic). (Note that the ROS clock defaults to system time if sim-time isn't being called for.)

I have an idea for how to solve this, which will mean I'll be extending the TimeStamp class, though I'll note that the inclusion of sim time support would mean that there has to be judgement calls made about whether something like the graph callback echos should refer to sim time or system time.

I intend to default to the ROS clock time as much as possible, including in that graph case I mentioned above. (The only exceptions will be where it makes less sense to (likely only console throttling (e.g. in the DelayedThrottleFilter)))

BrettRD commented 1 year ago

Great to have you on!

Confirming that galactic-devel is the branch you want, I cherry-picked through Tom's work and re-based my port onto galactic-devel in the process.

Time is a big deal on this code base, and my hacks to the fuse_core time class are a chewing-gum-and-string version of what it needs to be.

I'm not attached to std::chrono, it just seemed like a sensible API where conversion helpers were available in ROS1 and ROS2.

I think it's going to become extremely important to be able to track the name of the clock-source a timestamp was measured against; Fuse is a very small API change away from being able to time-align asynchronous sensors within a platform, and synchronise clocks across a fleet of robots. A named clock as part of the fuse_core timestamp class is possibly simpler, and certainly more powerful than using the ROS2 clock enum.

There are surprisingly few cases where time is actually sampled in fuse, this reduces complexity for off-line and sim-time uses. Most of it is log throttling, and it should be fairly easy to make the throttle clock configurable through params

The publisher package (the last package you'll have to compile) makes use of the ROS1 time-zero constant which no longer makes sense in ROS2. I don't have a solution for that aside from eliminating its use from the publisher package, which is tricky.

methylDragon commented 1 year ago

Yep the clock-source tracking is precisely what I'm intending to extend the TimeStamp class with. (Well, more precisely clock type (and I'll be associating the std::chrono::system_clock with RCL_SYSTEM_TIME).. I'm not intending to be adding a member for clock name, though I could see it trivially being added in an alternate constructor.)

I'll have to look into the publisher package when I get to it.. Amongst the many other things

methylDragon commented 1 year ago

I ended up refactoring the TimeStamp class to be a subclass of rclcpp:Time and adding methods that support conversion to and from (and arithmetic with) chrono's time_point and duration classes. I'm also making use of RCL_CLOCK_UNINITIALIZED to differentiate between zero-initialization and un-initialization. Hopefully that feature comes in handy when I'm tacking publishers down the line.

The only thing that meaningfully changes behavior wise now from the prior implementation is that the base clock type when constructing from a ROS timestamp will be (usually) RCL_ROS_TIME, and otherwise (e.g. if constructing from a chrono time point with a system clock), RCL_SYSTEM_TIME.

The previous implementation was a little bit more naive (since it'd forcefully convert everything to system time), so some assumptions and operations might fail at runtime now (since rclcpp::Time's operators require operations between times with identical clock types, which, in my opinion is the "correct" sort of behavior to be expecting.)

https://github.com/methylDragon/fuse/commit/5e605b0ac15a25329abde6430ad994aa2c4c8029

methylDragon commented 1 year ago

With the completion of https://github.com/locusrobotics/fuse/issues/276 , let's close this issue!

(Note: We ended up going back to ros time in most cases)