ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
517 stars 412 forks source link

Several Action Client Regressions Iron, Rolling #2369

Open SteveMacenski opened 8 months ago

SteveMacenski commented 8 months ago

Bug report

Required Info:

Steps to reproduce issue

In ROS 2 Rolling, Iron, the Action Clients are failing in bizarre ways in unit tests.

(1) Nav2's nightly is failing non-determinstically with Fast-DDS on this test: https://github.com/ros-planning/navigation2/blob/main/nav2_util/test/test_actions.cpp#L324-L380 with a crash resulting in the job timeout. This is a pretty rudimentary test as part of Nav2's sentinel testing of regressions of rclcpp_action. This has started to become flaky in the last few weeks that I've noticed.

(2) Additionally, both Rolling and Iron are failing deterministically on a few unit tests in Open Navigation's new Nav2 Task Server for complete coverage planning. In these I see two distinct failures:

      [ RUN      ] ServerTest.testServerTransactions
      [INFO] [1699901335.997836958] [coverage_server]: 
        coverage_server lifecycle node launched. 
        Waiting on external lifecycle transitions to activate
        See https://design.ros2.org/articles/node_lifecycle.html for more information.
      [INFO] [1699901335.997923059] [coverage_server]: Creating coverage_server
      [INFO] [1699901335.997945459] [coverage_server]: Configuring coverage_server
      [WARN] [1699901335.998896570] [coverage_server]: default_custom_order was not set! If using Custom Route mode, the custom order must be set per-request!
      [INFO] [1699901336.002143805] [coverage_server]: Activating coverage_server
      [INFO] [1699901336.003161216] [coverage_server]: Creating bond (coverage_server) to lifecycle manager.
  Error: ROR] [1699901336.009784889] [my_node.rclcpp_action]: unknown goal response, ignoring...
      [INFO] [1699901336.037539293] [coverage_server]: Generating coverage path in EPSG:4258 frame for zone with 13 outer nodes and 0 inner polygons.

The error unknown goal response, ignoring... is relatively new, introduced in https://github.com/ros2/rclcpp/pull/2132. CC @fujitatomoya

Note at this bug does not seem to occur in Humble, but does in Rolling and Iron. That can be clearly shown looking at the CI for that repository for the main branch running Iron and humble on Humble https://github.com/open-navigation/opennav_coverage with extremely simplistic test:

TEST(ServerTest, testServerTransactions)
{
  // Create server
  auto node = std::make_shared<ServerShim>();
  rclcpp_lifecycle::State state;
  node->configure(state);
  node->activate(state);
  auto node_thread = std::make_unique<nav2_util::NodeThread>(node);

  // Send some requests
  auto client_node = std::make_shared<rclcpp::Node>("my_node");
  auto action_client =
    rclcpp_action::create_client<opennav_coverage_msgs::action::ComputeCoveragePath>(
    client_node, "compute_coverage_path");

  auto goal_msg = opennav_coverage_msgs::action::ComputeCoveragePath::Goal();
  goal_msg.use_gml_file = true;  // Use file
  goal_msg.gml_field =
    ament_index_cpp::get_package_share_directory("opennav_coverage") + "/test_field.xml";

  auto future_goal_handle = action_client->async_send_goal(goal_msg);
  EXPECT_EQ(
    rclcpp::spin_until_future_complete(
      client_node,
      future_goal_handle), rclcpp::FutureReturnCode::SUCCESS);

  auto goal_handle = future_goal_handle.get();

  // Wait for the result
  auto future_result = action_client->async_get_result(goal_handle);
  EXPECT_EQ(
    rclcpp::spin_until_future_complete(client_node, future_result),
    rclcpp::FutureReturnCode::SUCCESS);

  // The final result
  auto result = future_result.get();
  EXPECT_EQ(result.code, rclcpp_action::ResultCode::SUCCEEDED);
}

(3) The last error is

      [==========] Running 1 test from 1 test suite.
      [----------] Global test environment set-up.
      [----------] 1 test from ComputeCoveragePathActionTestFixture
      [ RUN      ] ComputeCoveragePathActionTestFixture.test_tick
      [       OK ] ComputeCoveragePathActionTestFixture.test_tick (8 ms)
      [----------] 1 test from ComputeCoveragePathActionTestFixture (8 ms total)

      [----------] Global test environment tear-down
      [==========] 1 test from 1 test suite ran. (20 ms total)
      [  PASSED  ] 1 test.
      terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
        what():  feedback publisher is invalid, at ./src/rcl_action/action_server.c:936
      -- run_test.py: return code -6

Whereas the exception is thrown while destructing the test, since it passed the test. Note that I do not have these tests setup for Humble to know if this is an issue, but I do use the same boilerplate that this uses in Nav2's main CI on other BT nodes so I'm confident its not an issue.


These point to one (or more?) regressions in the Action Client in recent month(s) which is concerning to me since Nav2's entire internal stack is extremely reliant on them for every manner of operation.

fujitatomoya commented 8 months ago

The error unknown goal response, ignoring... is relatively new, introduced in #2132. CC @fujitatomoya

https://github.com/ros2/rclcpp/pull/2132 is merged in before Iron release, so this has been a while up to now. besides, this unknown result response, ignoring... error message is already there before https://github.com/ros2/rclcpp/pull/2132

right now, i do not have bandwidth to check this issue, but i will take a look once i have time.