ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.31k stars 1.2k forks source link

Adding Nav2 / Open Navigation Docking #4406

Closed SteveMacenski closed 3 weeks ago

SteveMacenski commented 4 weeks ago

Thanks to NVIDIA for their support to make this possible!

Original location: https://github.com/open-navigation/opennav_docking Tutorials: https://docs.nav2.org/tutorials/docs/using_docking.html Configuration: https://docs.nav2.org/configuration/packages/configuring-docking-server.html Commander API: https://docs.nav2.org/commander_api/index.html

A few known enhancements that would be great to have over time: https://github.com/ros-navigation/navigation2/issues/4405 https://github.com/ros-navigation/navigation2/issues/4404 https://github.com/ros-navigation/navigation2/issues/4403

SteveMacenski commented 3 weeks ago

There's a weird issue where the docking smoke system test is failing by timing out with no logging in CI but I cannot reproduce locally. Working on it... sorry for testing in CI

mikeferguson commented 3 weeks ago

How do you keep your ros2 launch-based tests from conflicting with each other? (like, this test publishes and subscribes stuff - is it maybe conflicting with some other test that is running? Is there some magic in ros2 launch file tests that magically assigns different ROS_DOMAIN_IDs? I haven't built a huge set of ros2 launch tests yet to see what issues arise...

SteveMacenski commented 3 weeks ago

How do you keep your ros2 launch-based tests from conflicting with each other? (like, this test publishes and subscribes stuff - is it maybe conflicting with some other test that is running? Is there some magic in ros2 launch file tests that magically assigns different ROS_DOMAIN_IDs? I haven't built a huge set of ros2 launch tests yet to see what issues arise...

The tests are run sequentially within a given package, but could be run in parallel w.r.t. other packages. In our CI settings, we have parallel-workers set to 1 (https://github.com/ros-navigation/navigation2/blob/main/.circleci/defaults.yaml#L11-L14) so that only one package is tested at a time making sure they don't conflict.

By the way, I found at least the first cause of the issue, the while loops are never exiting because the action_results aren't being populated for some reason, which implies that the result_future.add_done_callback(self.action_result_callback) or future.add_done_callback(self.action_goal_callback) are never called. Its really hard to tell which since CI isn't giving me back any logs so I'm kind of stabbing in the dark narrowing in on it:

        while len(self.action_result) == 0:
            rclpy.spin_once(self.node, timeout_sec=0.1)
            time.sleep(0.1)

I still cannot reproduce this locally but is apparently happening in Rolling on 24.04 or something peculiar in CI. I assumed at the start that may have been the issue (since it timed out rather than crashed or something), but I made sure it wasn't a regression in launch testing first - which are some of the fumbling commits you see above. I'm pretty sure now its either something we're doing or rclpy's spinning / action lib is busted (which I wouldn't put behind it since we did find some issues with Jazzy already with spinning repeating needed to make work the first time and historically we haven't relied on the result/accepted callbacks because they weren't reliable)

SteveMacenski commented 3 weeks ago

@mikeferguson in case you're curious about the outcome, basically I just swapped out all the done callbacks for the goal / result with spinning until complete and that resolved it. There seems to be some problem in spinning in Jazzy/Rolling I think, this isn't the first time I've seen something like this since working on the 24.04 migration. Putting all the action work inline and blocking seems to resolve the issue instead of using callback functions. Ex.

        future = self.undock_action_client.send_goal_async(goal)
        rclpy.spin_until_future_complete(self.node, future)
        self.goal_handle = future.result()
        assert self.goal_handle.accepted
        result_future = self.goal_handle.get_result_async()
        rclpy.spin_until_future_complete(self.node, result_future)
        self.action_result.append(result_future.result())

Now it all works as expected https://github.com/ros-navigation/navigation2/pull/4406/commits/95c547bb6e52c75eca270357c359f1a9e976ee7e