ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.47k stars 1.25k forks source link

Issues setting up robot on Humble in Ignition Gazebo #3586

Closed MarqRazz closed 1 year ago

MarqRazz commented 1 year ago

Steps to reproduce issue

Expected behavior

I have been experiencing several issues with the clock as I setup my robot to run in Gazebo. As you can see in my launch file I have use_sim_time set to True and below have verified through the parameter server that its set properly.

Here is the output from a few of the nodes on the parameter server...

# ros2 param get slam_toolbox use_sim_time
Boolean value is: True
# ros2 param get bt_navigator use_sim_time
Boolean value is: True
# ros2 param get lifecycle_manager_navigation use_sim_time
Boolean value is: True
# ros2 param get robot_state_publisher use_sim_time
Boolean value is: True
# ros2 param get joint_state_broadcaster use_sim_time
Boolean value is: True

Here is what the beginning of my /scan message looks like:

header:
  stamp:
    sec: 3
    nanosec: 420000000
  frame_id: base_laser

and /tf_static data for the base_laser frame:

- header:
    stamp:
      sec: 0
      nanosec: 0
    frame_id: base_laser_mount
  child_frame_id: base_laser

1) When I run the navigation stack with a dummy map that I have previously made I get errors about my LIDAR's frame_id data not being in sync with the system clock.

[component_container_isolated-2] [amcl 1684977727.446646155]: Message Filter dropping message: frame 'base_laser' at time 37.760 for reason 'the timestamp on the message is earlier than all the data in the transform cache'

And a map is never published so I get tf errors:

[component_container_isolated-2] [global_costmap.global_costmap 1684977727.002686044]: Timed out waiting for transform from base_link to map to become available, tf error: Invalid frame ID "map" passed to canTransform argument target_frame - frame does not exist

2) If I try to run SLAM I get spammed with errors in the console saying the message queue is full and if I try to navigate the robot somewhere I can see that transformPoseInTargetFrame node is using the system clock.

[sync_slam_toolbox_node-5] [slam_toolbox 1684848588.260446441]: Message Filter dropping message: frame 'base_laser' at time 59.460 for reason 'discarding message because the queue is full'
[component_container_isolated-2] [transformPoseInTargetFrame 1684848588.293940594]: Extrapolation Error looking up target frame: Lookup would require extrapolation into the future.  Requested time 1684848585.909372 but the latest data is at time 59.498000, when looking up transform from frame [base_link] to frame [map]

Actual behavior

I cant navigate or SLAM in sim.

SteveMacenski commented 1 year ago

To be honest, we have yet to evaluate an Ignition Gazebo setup due to other long-standing problems with it preventing us (e.g. binaries for ros2 control ign didn't work for a long time [maybe still?], missing plugins, etc) so that's not part of our officially supported setup yet. It is planned for later this year all things according to plan.

What version of Ignition Gazebo? I've heard some nasty things about some of the older versions and @bperseghetti suggests a particular version where things are working better. Could it be due to that?

I will say it just looks like you're having some run of the mill TF problems that's not really related to Nav2 itself -- but could either be due to the robot URDF/state publisher or due to your simulator not publishing what's expected.

There's alot you're changing all at once (simulator, custom robot, etc) so the generic but truthful advice would be to start with something more simple like the TB3 demo and slowly build up changes until you see where the problem lies. I'm guessing its with your robot, so using Gazebo classic like we have will at least help you isolate the issue to simulation vs urdf/sdf related to work through the problem. I know there are some users on new-Gazebo, though I haven't spoken to them in detail regarding it

MarqRazz commented 1 year ago

I'm pretty comfortable with Ignition and the issues it has (and specific versions you need to make things work), but your statement does't line up with the fact that nav/slam-toolbox are using the system clock when I have set use_sim_time to true which is why I opened the issue. If you have any suggestions on where I could start looking I would appreciate any pointers!

I will say it just looks like you're having some run of the mill TF problems that's not really related to Nav2 itself

Any specifics here would be helpful. Is the zero time with the static tf not true with GzClassic?

bperseghetti commented 1 year ago

@MarqRazz On a call right now, but I can share a little later how we run ours with humble and gz-garden using sim time including for the ros_gz_bridge, I can share launch as well as examples on some simulated robots.

SteveMacenski commented 1 year ago

My pointers are what I said - simplify to something we know works and build from there. if the issue if TF2/robot specific, there's a hundred ways it can manifest and its a bit of a logical puzzle. If the issue is the simulator, it would be good to know that sooner than later so you don't go down a rabbit hole. And, if you still see the issue with the old simulator + a base robot we know works, then that's something actionable I can also start to look into. So being processed can help us know where we need to focus efforts. For now, I'd be simply guessing.

The static TF looks that way for gazebo too, but that's generated by the URDF & robot state publisher, not the simulator, so I'm not sure why we would expect that to change.

transforms:
- header:
    stamp:
      sec: 0
      nanosec: 0
    frame_id: base_footprint
  child_frame_id: base_link
  transform:
    translation:
      x: 0.0
      y: 0.0
      z: 0.01
    rotation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
bperseghetti commented 1 year ago

Might also want to see if you have something like this using sim_time depending on the plugins you are using in sim: https://github.com/CogniPilot/mrbuggy3_simulator/blob/8b26ae67e0003119a5b2a5b14acfcec1b4a46256/mrbuggy3_gz_bringup/launch/ros_gz_bridge_base.launch.py#L77-L92

bperseghetti commented 1 year ago

Also of value: https://github.com/CogniPilot/mrbuggy3_simulator/blob/8b26ae67e0003119a5b2a5b14acfcec1b4a46256/mrbuggy3_gz_bringup/launch/ros_gz_bridge_base.launch.py#L25-L32

Do you have anywhere that you can show your ros_gz_bridge settings?

bperseghetti commented 1 year ago

Here is an example with our posix Cerebri (Zephyr-RTOS) connected to the cmd_vel output of NAV2 to control the vehicle with gz-garden and running full nav2 stack (including SLAM): https://github.com/ros-planning/navigation2/assets/10233412/a8a90426-7c50-4543-9100-abf185708cb9

MarqRazz commented 1 year ago

Sorry @SteveMacenski I misread your post and though you said you could see an issue with tf in my issue report. For now I think I will keep pushing forward with Ignition, I'm so close.

@bperseghetti you rock, THANKS for the tips and pointers!

Here are my bridge settings for the clock.

As for odom to base_link I'm using ros2_control's DiffDriveController and this is what I see if I use tf_echo

# ros2 run tf2_ros tf2_echo odom base_link
At time 25.134000000
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.000, 1.000]
- Rotation: in RPY (radian) [0.000, -0.000, 0.000]
- Rotation: in RPY (degree) [0.000, -0.000, 0.000]
- Matrix:
  1.000  0.000  0.000  0.000
  0.000  1.000  0.000  0.000
  0.000  0.000  1.000  0.000
  0.000  0.000  0.000  1.000

If I ask for map to base_laser while running SLAM I get

# ros2 run tf2_ros tf2_echo map base_laser
At time 14.574000000
- Translation: [0.000, 0.000, 0.394]
- Rotation: in Quaternion [0.000, 0.000, 0.000, 1.000]
- Rotation: in RPY (radian) [0.000, -0.000, 0.000]
- Rotation: in RPY (degree) [0.000, -0.000, 0.000]
- Matrix:
  1.000  0.000  0.000  0.000
  0.000  1.000  0.000  0.000
  0.000  0.000  1.000  0.394
  0.000  0.000  0.000  1.000

and here is the output from tf_view_frames. You can see that the Most recent/Oldest transform point to tf using the sim clock. image

If I query for subscribers of the clock everything looks good and all the nodes I expect are there (here's a gist of the output)

SteveMacenski commented 1 year ago

So what's left to check? That seems like most of it

MarqRazz commented 1 year ago

I'm thinking my next step is to build Nav/SlamToolbox from source and confirm the value of use_sim_time and the clock it's using. My guess is I'm missing setting it somewhere when launching.

[component_container_isolated-2] [transformPoseInTargetFrame 1684848588.293940594]: 
Extrapolation Error looking up target frame: Lookup would require extrapolation into the future.  
Requested time 1684848585.909372 but the latest data is at time 59.498000, when looking up 
transform from frame [base_link] to frame [map]
SteveMacenski commented 1 year ago

It doesn't take that long to compile if you ignore nav2_system_tests, that'll take you 20 minutes itself

bperseghetti commented 1 year ago

I think we did have to recreate some of the lower level nav2 bringup script to make sure sim_time was passed throughout. Maybe I'll try to make a "native nav2" launch this weekend and see what all I had to figure out about it previously. Granted not sure if there are also any non-sim_time calls from the actuator arm side from moveit on your setup?

MarqRazz commented 1 year ago

Sorry I got distracted playing with my Isaac simulation. I verified Nav2 is using sim time everywhere I expect but while inspecting the lookup frames sent to nav2_util::transformPoseInTargetFrame I figured out that Rviz is the source of those errors when sending Nav2 Goal's with the plugin. I tried changing the launch file to use_sim_time and that didn't fix it, but if I set the fixed_frame to be map the error goes away and the robot navigates. Unfortunately I don't know of an easy way to inspect the action goal message (like you could in ROS1) from the Rviz plugin but will keep digging.

As you can see I still have issues with the map smearing while SLAMing but progress!

https://github.com/ros-planning/navigation2/assets/25058794/4ecb515d-41a9-4870-8346-0f6d157b8fdf

https://github.com/ros-planning/navigation2/assets/25058794/0a130dc0-643c-485d-a67f-5060fffc94c4

SteveMacenski commented 1 year ago

I tried changing the launch file to use_sim_time and that didn't fix it, but if I set the fixed_frame to be map the error goes away and the robot navigates. Unfortunately I don't know of an easy way to inspect the action goal message (like you could in ROS1) from the Rviz plugin but will keep digging.

Got it - that was actually pertinent information that you were sending it in the non-global frame. That is actually a known artifact of the rviz plugin that you have to send the goals in the global frame. I honestly didn't know that was the specific error given, but that is a quirk I was aware of.

looking at the hidden topics ros2 topic echo /navigate_to_pose/_action/status we can see the status

status_list:
- goal_info:
    goal_id:
      uuid:
      - 152
      - 146
      - 130
      - 181
      - 123
      - 17
      - 33
      - 151
      - 165
      - 220
      - 233
      - 200
      - 122
      - 9
      - 16
      - 223
    stamp:
      sec: 69
      nanosec: 886000000
  status: 2

Which appears to be in the simulation time. So I think the BT navigator is OK.

I took a look from the logs I see and:

[component_container_isolated-5] [ERROR] [1685751532.615056985] [transformPoseInTargetFrame]: Extrapolation Error looking up target frame: Lookup would require extrapolation into the future.  Requested time 1685751532.389336 but the latest data is at time 70.167000, when looking up transform from frame [base_scan] to frame [map]

Leads me to look at transformPoseInTargetFrame. That's only used in costmap2D and RPP, and I'm not currently using RPP.

Within costmap 2D its used in the function transformPoseToGlobalFrame which is only called by the planner server in transformPosesToGlobalFrame.

It appears that the goal is obtained directly from the action's request field geometry_msgs::msg::PoseStamped goal_pose = goal->goal so that tells me that rviz panel isn't using sim time. The rviz panel sets the pose in navigation_goal_.pose = pose; and if you track back where pose is populated you find pose.header.stamp = rclcpp::Clock().now();

That explains that. We're just making a clock willy-nilly which is going to default to system time. Silly goose! I looked to see where else we do that in the stack and the answer is ..... no where else. So yeah, that's really suspect. I'm currently compiling a change to use the rviz panel's internal node instead to get the time, which I assume will fix this issue

SteveMacenski commented 1 year ago

https://github.com/ros-planning/navigation2/pull/3595 resolves!

Rak-r commented 1 year ago

Might also want to see if you have something like this using sim_time depending on the plugins you are using in sim: https://github.com/CogniPilot/mrbuggy3_simulator/blob/8b26ae67e0003119a5b2a5b14acfcec1b4a46256/mrbuggy3_gz_bringup/launch/ros_gz_bridge_base.launch.py#L77-L92

I saw in the launch files you are creating pose bridge but where gazebo is publishing to that pose topic. When I do gz topic-l there is no topic related to robot/pose. I am having /robot/odometry which I am sending to ROS using bridge. I am also using the same ackermann plugin from gazebo. Setting NAV2 and Gazebo garden having similar issues. Could you advice something.