ros / geometry2

A set of ROS packages for keeping track of coordinate transforms.
191 stars 279 forks source link

Allow Multiple Parents at Top Level of TF #437

Open matt-attack opened 4 years ago

matt-attack commented 4 years ago

It is possible to want to represent that position of a robot with respect to several "parent" reference frames such as different maps, geocordinates, or odometry. Right now this is usually done through chaining the frames (Map -> Odom -> Robot). However, this is rather constraining and requires you to generate these chained transforms at the same rate. If you start referencing the position of your robot to more than one unconnected "map" space this pattern starts to fall apart, especially if there is no physical link between your maps (Map -> Other Unrelated Map? -> Odom -> Robot). One example of this is if you want to be able to estimate the position of your robot with respect to a different robots odometry frame, which isn't linked to any sort of global map frame.

So far we've been working around this by having these "parent" frames as children of the robot, however this feels rather strange and doesn't allow for multiple robots in the same tree. It would be nice if it were allowed/possible to have multiple parent frames for a TF item at the top level so we don't have to resort to such "hacks".

tfoote commented 4 years ago

However, this is rather constraining and requires you to generate these chained transforms at the same rate.

This assertion is not true. As long as you have an update rate that's at a higher frequency than the characteristic frequency of the system, interpolation will work just fine.

One example of this is if you want to be able to estimate the position of your robot with respect to a different robots odometry frame, which isn't linked to any sort of global map frame.

How do you make the reference to a disconnected tree if there's no physical link? If the robots are in the same space there is a shared reference of the world. If you have multiple maps there is by definition a transform between them because they are in the same physical space. The challenge is that you have to estimate the offset between them.

The standard way to do this is that if you make an observation between two trees. You should then compute the difference between their roots and then add the transform or transforms at the higher level with the residual between the roots. This is done regularly with all the localization algorithms. Localization detects offsets to landmarks as the vehicle drives around. And based on that motion you subtract out the motion of the base in the odometric frame to compute the transform from the map to the odometric frame. If you have different maps then you can do the same logic to be able to bridge between multiple maps. See REP 105 for a recommended structure.

Fundamentally you cannot add multiple parents and have the nice closed form solution for tree traversal used in tf. Multiple parents will allow loops and loops then become ambiguous how to resolve as well as requiring much more complicated algorithms to query. If you'd like to understand more about the traversal logic please see the paper: http://wiki.ros.org/Papers/TePRA2013_Foote

The results published to tf should be a best estimate of a fused result based off of as many inputs as are available so that other components in the system can rely upon the results.

So far we've been working around this by having these "parent" frames as children of the robot, however this feels rather strange and doesn't allow for multiple robots in the same tree. It would be nice if it were allowed/possible to have multiple parent frames for a TF item at the top level so we don't have to resort to such "hacks".

Why does multiple parents feel better? Why are the other one's "parents"? If you're doing something like observing another robot in a sensor. That gives you one measurement of where that robot is in relationship to you are. If you have two sensors detecting that other robot then we now need to fuse those measurements and estimate where the robot actually is. You could use either of those measurements to bridge between the trees. But how much uncertainty is in each of those measurements? To connect the tree reliably you then need to run it through an estimator or something else that will work to resolve the uncertainty to a best estimate. When doing this it's common to create semi-temporary coordinate frames as leafs to make the observations available and easy to query. So you'll have robot_1' and robot_1'' observations. If you have any sort of localization you can augment it with leveraging the projections of the robot into the other robots tree to see how they line up.

But in general as soon as you bridge the observations across the trees, there are now redundant cycles in the tree. And something as generic as the tf tree cannot resolve that redundant information in a generic way. You need to understand the uncertainties in each of the measurements and then compute a best estimate. And once you compute that best estimate, you can subtract off the local transform to the root of your tree and then publish the delta only to bridge the trees. This communicates the same information and keeps the tree which can be queried in closed form by any component of the system.

The observation of the robot versus the reported position of the robot have noteably different use cases. If you're doing hand eye control, you don't want to rely on the self reported position of the robot, you want to operate based on the observed position of the robot in the eye. But if you're driving to a landmark observed by the other robot in the past you'll want to do the full transformation through the common map to the other robots position. Theoretically these two pieces of information are the same, but due to uncertainty in the system, they are unlikely to agree and as such need to be kept track of independently that's why the "children" frames are actually valuable.

matt-attack commented 4 years ago

However, this is rather constraining and requires you to generate these chained transforms at the same rate.

This assertion is not true. As long as you have an update rate that's at a higher frequency than the characteristic frequency of the system, interpolation will work just fine.

That is true, but results in needing to update and generate TFs at a rate higher than strictly necessary. Also, chaining these transforms results in increased processing and reduced accuracy of the resulting lookups.

The standard way to do this is that if you make an observation between two trees. You should then compute the difference between their roots and then add the transform or transforms at the higher level with the residual between the roots.

That is actually what we are doing more or less. We are basically dropping markers that each have a location in another vehicles odometry frame and are measuring our position relative to them (then fusing this and other information into an EKF to get a good estimate over time). However, this is all done in the odometry frame of that robot, so there is no fixed link between it and any "global" earth fixed frame. This doesn't really fit in the REP105 layout and results in the separate trees that you mentioned.

Fundamentally you cannot add multiple parents and have the nice closed form solution for tree traversal used in tf. Multiple parents will allow loops and loops then become ambiguous how to resolve as well as requiring much more complicated algorithms to query.

In an arbitrary graph this is true, but it is possible to allow non-looping graphs. For instance, a single robot with a parent odometry and then a parent map frame. Here's an example of a non-looping multi-parent structure that I would imagine in our case, with some changes to the REP105 structure that multiple parents would allow.

image

It is possible to work around this by doing what we did and just inverting these extra parent frames to be children. Something just feels off about that though and it breaks our existing state estimators that assume REP105.

Why does multiple parents feel better? Why are the other one's "parents"?

I would say they can be defined as parents because they are all frames which you actually measure/estimate the robots position (/base_link) relative to. Moving the robot (/base_link) does not actually result in these reference frames moving along with it (ignoring odometery error causing drift), so defining them as children doesn't quite make sense.

As for why they both should be parents rather than chained, the story is a bit more nuanced. Perhaps it would be good to just talk about the REP105 /map and /odom frames since those seem to exhibit the same kind of structure.

At least in my experience, the robot's position is estimated in each of these frames (/map and /odom) independently. Only then is the REP105 /map->/odom->/base_link structure made in TF by linking /map to /odom through /base_link (the vehicle's position in each frame). This suggests /base_link should sit somewhere in the middle of the two in the TF tree. I will admit this point alone is as much of an argument for /odom to be a child of /base_link as a parent, but being a separate parent at least feels like it describes the relationship more accurately to me.

In fact REP105 seems to confirm this, but admits that the TF tree structure is what prevents this:

Although intuition would say that both map and odom should be attached to base_link, this is not allowed because each frame can only have one parent.

tfoote commented 4 years ago

In an arbitrary graph this is true, but it is possible to allow non-looping graphs. For instance, a single robot with a parent odometry and then a parent map frame. Here's an example of a non-looping multi-parent structure that I would imagine in our case, with some changes to the REP105 structure that multiple parents would allow.

Certainly for the graph that you draw up. There's no redundant loops. But how would you enforce that when there's potentially multiple publishers? And what do you do if a loop is made accidentally?

The other major challenge of allowing multiple parents is that every query requires a search for connectivity that's much more complex. It becomes more like the order of the number of links not the order of the number of layers.

As you cite, your own odom and your own map frame you are moving in those. However I think it makes less sense to think about yourself moving in the estimated odom frame of the other robot. In that case the estimate can and will give updates etc. And the parent vs child isn't very compelling.

The benefit of inserting the odom between the map and the robot actually has significant benefits with respect to your concerns about publishing frequency. Because the transform only represents the drift it is no where near as time sensitive. which means that you can future date transforms by the latency of your publishing and support immediate querying on high latency transforms. As long as you're confident that the drift is negligable during your latency period. In this case map -> odom is measuring the drift of the odom and odom->base_link is representing the physical motion of the vehicle. These are not the things that you're measuring.

When you're estimating odom_robot1' It probably doesn't actually make sense to keep track of it in your current frame, but it should actually be connected to a map or something else unmoving like some zero initialized map frame that you're also updating based on your localization algorithms. For example if you drive forward, spin a wheel, your odom drifts. When your localization snaps you back to correct for the wheel slip, If you didn't observe the other robots odometric frame. Do you expect it to have stayed connected to your baselink, your odom frame, or your map frame? Clearly the base link has moved, so no, and your odom frame drifted, so no. But their odom frame should still be essentially the same as your map frame, subject to their drift rate.

But stepping back out from your specific structure. It would be "nice" to be able to support more arbitrary graphs. But for computational complexity reasons that's not a trade-off that's worth making since you can relatively easily connect all coordinate frames into a tree structure. With the tree structure there is almost zero cost of storing new information and querying is super fast. Which are the two most common actions.

matt-attack commented 4 years ago

Certainly for the graph that you draw up. There's no redundant loops. But how would you enforce that when there's potentially multiple publishers? And what do you do if a loop is made accidentally?

One could produce errors if a cycle is detected, much like is already done inside TF2 (somewhat to my surprise). https://github.com/ros/geometry2/blob/melodic-devel/tf2/src/buffer_core.cpp#L1058

The benefit of inserting the odom between the map and the robot actually has significant benefits with respect to your concerns about publishing frequency. Because the transform only represents the drift it is no where near as time sensitive. which means that you can future date transforms by the latency of your publishing and support immediate querying on high latency transforms. As long as you're confident that the drift is negligable during your latency period. In this case map -> odom is measuring the drift of the odom and odom->base_link is representing the physical motion of the vehicle. These are not the things that you're measuring.

Interesting. That is true with some map localization methodologies. I wasn't thinking of cases like that. However in our instance that architecture only results in greater numerical error since both /map and /odom transforms are produced simultaneously. Allowing flexibility in the structure (be it multiple parents or REP105 chaining) would allow users to pick the layout that fits their localization system most accurately.

When you're estimating odom_robot1' It probably doesn't actually make sense to keep track of it in your current frame, but it should actually be connected to a map or something else unmoving like some zero initialized map frame that you're also updating based on your localization algorithms.

True, the zero initialized "map" frame with localization is what we are using for this. Just to make it fit into a single TF tree we are inverting it then taping it onto our /base_link as a child. Here's the structure:

image

The other major challenge of allowing multiple parents is that every query requires a search for connectivity that's much more complex. It becomes more like the order of the number of links not the order of the number of layers.

I guess it depends what one desires to optimize for. If the structure of your tree/graph is relatively static (changes much less often than you perform searches), one can cache the results of the graph searches then invalidate them when the structure changes and avoid the additional cost per lookup (possibly making all lookups faster as well).

To mitigate more of the cost, since you are already caching the path through the tree you could also cache entire static parts (/static_tf) of the transform. Many transforms could be optimized to a map or even array (though this would be n_frames*n_frames items long) lookup. Static to static frame lookups as well as primarily static chained (such as left_camera_optical->left_camera_mechanical->camera_body->lidar->roof_rack->base_link->odom) lookups are a large fraction of our usage of TF so the gains could be substantial in many applications. Correct me if this optimization is already being done somewhere.

tfoote commented 4 years ago

One could produce errors if a cycle is detected, much like is already done inside TF2 (somewhat to my surprise). https://github.com/ros/geometry2/blob/melodic-devel/tf2/src/buffer_core.cpp#L1058

This only detects a full cycle which is clearly invalid. But a redundant path through the graph has no clearly defined response. And is much harder to detect.

Another challenge with supporting multiple parents is that you now have to be able to "unparent" something and that then becomes a stateful change that can't be reliably captured with an unreliable transport.

Interesting. That is true with some map localization methodologies. I wasn't thinking of cases like that. However in our instance that architecture only results in greater numerical error since both /map and /odom transforms are produced simultaneously. Allowing flexibility in the structure (be it multiple parents or REP105 chaining) would allow users to pick the layout that fits their localization system most accurately.

If you're producing them at the same rate at a high rate then there's nothing stopping you from publishing them from the same module simultaneously.

True, the zero initialized "map" frame with localization is what we are using for this. Just to make it fit into a single TF tree we are inverting it then taping it onto our /base_link as a child.

As I explained above you shouldn't be attaching the observed/estimated odometry to your robot frame, but should be attaching it to the zero initialized map frame. You are observing the coordinate frame from the robot so you can report the direct observation. But for a better estimate of the location of the other odometric frame you want to be able to estimate it's physical model. If you are computing and estimating the link from your robot to the odometric frame the physical model is only as accurate as your odometry. Whereas if you think about the other odom in terms of the zero initialized map frame you know that the expected update is basically zero between timesteps and your estimator can be tuned much more precisely as it will know that more of the noise is in the observation than in the model.

But in either case the limitation of having only one parent isn't really a problem. You can have the frame hanging off of base_link as you have. Or it can hang off of the map as I'm suggesting.

The other major challenge of allowing multiple parents is that every query requires a search for connectivity that's much more complex. It becomes more like the order of the number of links not the order of the number of layers.

I guess it depends what one desires to optimize for. If the structure of your tree/graph is relatively static (changes much less often than you perform searches), one can cache the results of the graph searches then invalidate them when the structure changes and avoid the additional cost per lookup (possibly making all lookups faster as well).

To mitigate more of the cost, since you are already caching the path through the tree you could also cache entire static parts (/static_tf) of the transform. Many transforms could be optimized to a map or even array (though this would be n_frames*n_frames items long) lookup. Static to static frame lookups as well as primarily static chained (such as left_camera_optical->left_camera_mechanical->camera_body->lidar->roof_rack->base_link->odom) lookups are a large fraction of our usage of TF so the gains could be substantial in many applications. Correct me if this optimization is already being done somewhere.

We have spent a lot of time optimizing they queries and as I mentioned in #439 if you have a technique that can show that it's more efficient we can look at it. But to make sure it's better in all conditions and that the cache remains accurate I think that it's going to be a challenge to optimize lookups without costing noteably more preparing for the lookups. The computational overhead of the lookup is quite low. On my laptop the lookup speed tests average well below 10us and canTransform's resolve in under 5us on average. This is averaging over 1000000 points for a 10 level tree.

ros-discourse commented 2 years ago

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/map-base-odom-as-alternative-for-rep-105-recommended-frame-order/25095/12