ros / geometry2

A set of ROS packages for keeping track of coordinate transforms.
189 stars 274 forks source link

Lookup transform predictably misses the translation part in multi-node setup #566

Open roshambo919 opened 3 weeks ago

roshambo919 commented 3 weeks ago

This unexpected tf2 lookup behavior has been stumping me for days on ros2 humble.

I have the following frames:

world
agent0
agent0/lidar0
agent1
agent1/lidar0
... # etc up to agent 2

In this particular example, agent0 is a mobile agent, so the frame agent0 to world is published dynamically at a high rate. All other frames (e.g., agent0/lidar0 to agent0, agent1 to world, agent1/lidar0 to agent1, ...) are static and published on tf_static from a rosbag once at the beginning.

I have several nodes that get the transformations in python via buffer.lookup_transform(). One of the nodes ("node1") looks up all of the agenti/lidar0 to world frames and never has any problems. An example logging output would show, e.g.,

agent0 : geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent0/lidar0',
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=-87.25453507565153, y=-27.90808621333283, z=1.5932031571520584), 
          rotation=geometry_msgs.msg.Quaternion(x=-2.399310128784853e-05, y=-1.7728866372079825e-05, z=-0.0013892804829969047, w=0.9999990345044129))
)

agent1 : geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent1/lidar0', 
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=-67.25457001, y=-27.96375847, z=20.60000038), 
          rotation=geometry_msgs.msg.Quaternion(x=0.00035956799826580756, y=0.25881879533495106, z=-0.0013419260383038315, w=0.9659248941438082))
)
...

Another one of the nodes ("node2") performs what I believe to be the exact same lookup process. Unfortunately, the output is incorrect. While sometimes the very first lookup call produces the expected output, every subsequent lookup (and sometimes even the first) completely neglects the translation component of the static transforms. Thus, an output from node2's lookups would be:

agent0 : geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent0/lidar0',
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=-87.25453507565153, y=-27.90808621333283, z=1.5932031571520584), 
          rotation=geometry_msgs.msg.Quaternion(x=-2.399310128784853e-05, y=-1.7728866372079825e-05, z=-0.0013892804829969047, w=0.9999990345044129))
)

agent1:geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent1/lidar0', 
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=0.0, y=0.0, z=0.0),  # THIS IS THE PROBLEM!!!!!
          rotation=geometry_msgs.msg.Quaternion(x=0.00035956799826580756, y=0.25881879533495106, z=-0.0013419260383038315, w=0.9659248941438082))
)

where everything is the same except that the translation has been neglected to be populated. The quaternion is correct. Agent0 (being dynamic and on /tf) is correct, HOWEVER, I will note that in node2 I sometimes get an extrapolation error for agent0 that does not appear in node1.

In both cases of node1 and node2, I am using MessageFilters to synchronize multiple topics for a callback. It just so happens that node1 is working while node2 is not. It should be known that the output of node1 is one of the synchronized inputs for node2 (i.e. node2 depends on node 1).

I'm at a loss for how to debug this. Has anyone come across this? @tfoote. My code base is large, but if needed I could spend some time trying to recreate in an MWE.

Addendum:

tfoote commented 3 weeks ago

Unfortunately, the output is incorrect. While sometimes the very first lookup call produces the expected output, every subsequent lookup (and sometimes even the first) completely neglects the translation component of the static transforms.

Inconsistent behaviors like this suggest that there's a different data source, possibly conflicting or a race conditoin? Is there a conflicting static transform in the bag file for world to agent0 perhaps?

All other frames (e.g., agent0/lidar0 to agent0, agent1 to world, agent1/lidar0 to agent1, ...) are static and published on tf_static from a rosbag once at the beginning.

Using a dedicated static_transform_publisher will be more reliable: https://docs.ros.org/en/rolling/Releases/Release-Humble-Hawksbill.html#more-intuitive-command-line-arguments-for-static-transform-publisher https://docs.ros.org/en/rolling/p/tf2_ros/cli_tools.html

If you're cascadng responses it's doing the same lookup at a later time, which further reinforces the likelyhood that there's a race condition with potentially conflicting data sources being interleaved.