ros-visualization / rviz

ROS 3D Robot Visualizer
BSD 3-Clause "New" or "Revised" License
790 stars 459 forks source link

What does "unknown reason for failure" mean when rviz is trying to display Lidar (/scan) data? #1796

Closed pitosalas closed 1 year ago

pitosalas commented 1 year ago

ROS1, Ubuntu 22.04, Noetic.

When displaying LaserScan data in Rviz, I get this error:

Screenshot 2023-05-21 at 12 47 38 PM

The tf and tf tree all look reasonable to me. I am not sure what the problem is with the tf that is causing the problem. I suspect it's the transform from /odom to /base_footprint but that looks ok:

Screenshot 2023-05-21 at 12 54 28 PM

Can someone help me figure out what is causing it? I am happy to look at rviz source code if I knew where to look. Or any hint at all? Thanks!

rhaschke commented 1 year ago

Please provide the full error message. Particularly, I am missing the two involved TF frames.

rhaschke commented 1 year ago

The error originates from here: https://github.com/ros-visualization/rviz/blob/f35dee1b8ebd0dccd225f2b48832dc0036dc6bfc/src/rviz/frame_manager.cpp#L331

This means that this call doesn't report an error anymore, which is strange: https://github.com/ros-visualization/rviz/blob/f35dee1b8ebd0dccd225f2b48832dc0036dc6bfc/src/rviz/frame_manager.cpp#L325

pitosalas commented 1 year ago

Here is some more info: The full message (which is not helpful):

Screenshot 2023-05-21 at 3 06 03 PM

The tf tree in rviz:

Screenshot 2023-05-21 at 3 11 00 PM

Here is the tf tree in rat:

Screenshot 2023-05-21 at 3 20 02 PM

Here is output from tf_echo and tf_monitor:

[real:platform3]~$ cat github_info 
Waiting for transform chain to become available between odom and base_footprint 

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.00966198: max = 0.0814555

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0391491 Max Delay: 0.0716842
Node: unknown_publisher 45.1549 Hz, Average Delay: 0.0391491 Max Delay: 0.0716842
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0170883: max = 0.0821697

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0380917 Max Delay: 0.0716842
Node: unknown_publisher 44.922 Hz, Average Delay: 0.0380917 Max Delay: 0.0716842
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0283903: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0439559 Max Delay: 0.0825805
Node: unknown_publisher 43.6603 Hz, Average Delay: 0.0439559 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0350159: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0450682 Max Delay: 0.0825805
Node: unknown_publisher 41.5673 Hz, Average Delay: 0.0450682 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0387006: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0446184 Max Delay: 0.0825805
Node: unknown_publisher 42.5153 Hz, Average Delay: 0.0446184 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0405556: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0430439 Max Delay: 0.0825805
Node: unknown_publisher 42.7204 Hz, Average Delay: 0.0430439 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0419136: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0421325 Max Delay: 0.0825805
Node: unknown_publisher 42.0878 Hz, Average Delay: 0.0421325 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0463595: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0437379 Max Delay: 0.0825805
Node: unknown_publisher 41.6842 Hz, Average Delay: 0.0437379 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0500028: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0441776 Max Delay: 0.0825805
Node: unknown_publisher 42.2377 Hz, Average Delay: 0.0441776 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.0517838: max = 0.112265

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher 42.216 Hz, Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.077401: max = 0.284176

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher 42.216 Hz, Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0

RESULTS: for odom to base_footprint
Chain is: odom -> base_footprint
Net delay     avg = 0.138728: max = 0.497438

Frames:
Frame: base_footprint published by unknown_publisher Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher 42.216 Hz, Average Delay: 0.0441619 Max Delay: 0.0825805
Node: unknown_publisher(static) 1e+08 Hz, Average Delay: 0 Max Delay: 0
At time 1684696628.883
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.081, 0.997]
            in RPY (radian) [0.000, -0.000, 0.162]
            in RPY (degree) [0.000, -0.000, 9.295]
At time 1684696629.285
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.082, 0.997]
            in RPY (radian) [0.000, -0.000, 0.164]
            in RPY (degree) [0.000, -0.000, 9.382]
At time 1684696630.291
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.080, 0.997]
            in RPY (radian) [0.000, -0.000, 0.161]
            in RPY (degree) [0.000, -0.000, 9.218]
At time 1684696631.299
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.082, 0.997]
            in RPY (radian) [0.000, -0.000, 0.163]
            in RPY (degree) [0.000, -0.000, 9.355]
At time 1684696631.923
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.082, 0.997]
            in RPY (radian) [0.000, -0.000, 0.164]
            in RPY (degree) [0.000, -0.000, 9.401]
[real:platform3]~$ 
pitosalas commented 1 year ago

You're saying that discoverFailureReason was called because transformHasProblems had returned true but now it is no longer doing so...

On May 21, 2023, at 2:20 PM, Robert Haschke @.***> wrote:

The error originates from here: https://github.com/ros-visualization/rviz/blob/f35dee1b8ebd0dccd225f2b48832dc0036dc6bfc/src/rviz/frame_manager.cpp#L331

This means that this call doesn't report an error anymore, which is strange: https://github.com/ros-visualization/rviz/blob/f35dee1b8ebd0dccd225f2b48832dc0036dc6bfc/src/rviz/frame_manager.cpp#L325

— Reply to this email directly, view it on GitHub https://github.com/ros-visualization/rviz/issues/1796#issuecomment-1556248657, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAK5CLEQXSPVEUVR2H65MTXHJMIBANCNFSM6AAAAAAYJQF47U. You are receiving this because you authored the thread.

pitosalas commented 1 year ago

If the global frame is changed to anything other than odom, then the Lidar display does what it is supposed to do. Which is why I somehow suspect the tf chain between odom and base_scan (which goes through base_footprint).

Screenshot 2023-05-21 at 3 31 09 PM
rhaschke commented 1 year ago

Could you please record and post a rosbag file including all the topics, i.e. /clock, /tf*, and /scan? 10s should be enough.

pitosalas commented 1 year ago

Let me know if this contains what you can look at. No /clock topic. Is that supposed to be there?

On May 22, 2023, at 2:00 AM, Robert Haschke @.***> wrote:

Could you please record and post a rosbag file including all the topics, i.e. /clock, /tf*, and /scan? 10s should be enough.

— Reply to this email directly, view it on GitHub https://github.com/ros-visualization/rviz/issues/1796#issuecomment-1556582428, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAK5CIABUA7Y3RXYUGTD23XHL6GFANCNFSM6AAAAAAYJQF47U. You are receiving this because you authored the thread.

rhaschke commented 1 year ago

Please upload the file at github. Your email didn't comprise an attachment...

pitosalas commented 1 year ago

2023-05-22-11-30-36.bag.txt

pitosalas commented 1 year ago

In the meanwhile I totally rebuilt the software on my robot. I thought maybe I had corrupted something inadvertently. I also discovered that Noetic is not "officially" supported on Ubuntu 22.04 so I rebuilt the robot on Ubuntu 20.04.

Unfortunately, the problem still appears.

I appreciate your help so far! I have been stumped by this for a week so I hope that you can find a moment to lookl into this.

pitosalas commented 1 year ago

@rhaschke I am using linorobot.org. My Launch file uses a kalman filter to fuse the raw odom with the imu:

    <node pkg="robot_localization" type="ekf_localization_node" name="ekf_localization"> 
        <remap from="odometry/filtered" to="odom" />
        <rosparam command="load" file="$(find linorobot)/param/ekf/robot_localization.yaml" />
    </node>

If the IMU is not physically oriented correctly, could that somehow lead to the odom -> laser transfor to "failing"? The thinking is that somehow a badly callibrated IMU would lead to a somehow invalid odom tf which would lead to the problem. However I have since determined that the IMU is fine as it is.

pitosalas commented 1 year ago

Could you please record and post a rosbag file including all the topics, i.e. /clock, /tf*, and /scan? 10s should be enough.

I did, did you see it?

https://github.com/ros-visualization/rviz/files/11530872/2023-05-22-11-30-36.bag.txt

Thanks!

rhaschke commented 1 year ago

Thanks for providing the bag file. I can reproduce your issue and will look into it right now.

rhaschke commented 1 year ago

After searching for quite a while into the wrong direction, I found the culprit: The problem is in your LaserScan messages: The time_increment and scan_time values are probably way too large (measured in seconds):

time_increment:  153333.328125
scan_time:  110400000.000000

rviz expects TF lookups to be possible within the whole time range from stamp to stamp + scan_time, which is 110400000 secs = 3,5 years in your case! This makes all your TF lookups fail in the message filter. However, error lookup with transformHasProblems() succeeds because it just considers the message time stamp, resulting in the unknown error warning.

Note, scan_time = time_increment * len(ranges).

Please consider donating as I spent a bunch of time hunting a rviz bug that turned out to be a message issue.

pitosalas commented 1 year ago

@rhaschke Great catch, thank you thank you! Your diagnosis makes perfect sense.

I've been tracing through the ydlidar ros code that I am using and I can see where it goes wrong. But I dont understand it well enough to fix it. I have posted on their issues list. If you know anything about ydlidar X4 on ROS Noetic I would be interested.

In the meanwhile I will certainly donate. I can either donate 5/month for a year or donate $75 right now. Do you prefer one or the other? And is that an appropriate amount?

rhaschke commented 1 year ago

I don't know the ydlidar X4, but great that my analysis helps you. I guess the time units are chosen wrongly (e.g. nanoseconds instead of seconds). Feel free to link the source repository here. You can freely choose the amount and type of donation. Thanks for considering!

pitosalas commented 1 year ago

The ydldar repo is: https://github.com/EAIBOT/ydlidar.git I traced through it and there are some details I havent mastered yet but I can see how it goes wrong. Thanks if you take a look!

rhaschke commented 1 year ago

Thanks for supporting me by a donation. See here for a fixup of ydlidar.

pitosalas commented 1 year ago

Thanks! It worked. Your fix was simple… I saw that it could work but didn’t do it myself because I feared that there were other interactions that I didn’t see. Thanks again!Pito Salas Computer ScienceBrandeis University On May 26, 2023, at 3:19 AM, Robert Haschke @.***> wrote: Thanks for supporting me by a donation. See here for a fixup of ydlidar.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>