ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
536 stars 417 forks source link

Executing action server but nothing is ready #2226

Open EvgeniNF opened 1 year ago

EvgeniNF commented 1 year ago

Bug report

Required Info:

Steps to reproduce issue

I am do stress test for action server and catch this error on random request. How i can fix that?

using Exec = devicelib_msgs::action::Exec;
using GoalHandle = rclcpp_action::ServerGoalHandle<Exec>;

class Server final : public rclcpp::Node {
   public: 
    Server() : rclcpp::Node("action_server") {
        using namespace std::placeholders;

        m_cb = create_callback_group(rclcpp::CallbackGroupType::Reentrant);
        m_server = rclcpp_action::create_server<Exec>(
            this,
            "/test/server",
            std::bind(&Server::goalHandle, this, _1, _2),
            std::bind(&Server::handleCancel, this, _1),
            std::bind(&Server::handleAccepted, this, _1),
            rcl_action_server_get_default_options(),
            m_cb
        );
    }

    Server(Server const &) = delete;
    Server(Server &&) noexcept = delete;
    Server& operator=(Server const &) = delete;
    Server& operator=(Server &&) noexcept = delete;
    ~Server() noexcept = default;

   private:
    rclcpp_action::GoalResponse goalHandle(rclcpp_action::GoalUUID const &uuid,
                                           Exec::Goal::ConstSharedPtr goal) {
        RCLCPP_INFO_STREAM(get_logger(), "Goal handle");
        return rclcpp_action::GoalResponse::ACCEPT_AND_EXECUTE;
    }

    rclcpp_action::CancelResponse handleCancel(std::shared_ptr<GoalHandle> const goalHandle) {
        RCLCPP_INFO_STREAM(get_logger(), "Handle cancel");
        return rclcpp_action::CancelResponse::ACCEPT;
    }

    void handleAccepted(std::shared_ptr<GoalHandle> const goalHandle) {
        RCLCPP_INFO_STREAM(get_logger(), "Handle accepted");
        std::thread([goal = goalHandle, this] () -> void {
            this->execute(std::move(goal));       
        }).detach();
    }

    void execute(std::shared_ptr<GoalHandle> const goalHandle) {
        RCLCPP_INFO_STREAM(get_logger(), "Start execute");
        auto const goal = goalHandle->get_goal();
        auto const feedback = std::make_shared<Exec::Feedback>();
        auto const result = std::make_shared<Exec::Result>();

        size_t counter = 0;
        for (auto const &_: goal->details) {
            if (goalHandle->is_canceling()) {
                result->message = "Canceled by peer";
                goalHandle->canceled(result);
                RCLCPP_INFO(get_logger(), "Goal canceled");
                return;
            }

            RCLCPP_INFO_STREAM(get_logger(), "Start detail: " << counter);
            std::this_thread::sleep_for(std::chrono::milliseconds(100));

            feedback->num_executed_details = counter;
            feedback->command_status.command_name = "command";
            counter++;
            feedback->command_status.seconds = goal->details.size() - counter;
            goalHandle->publish_feedback(feedback);
            RCLCPP_INFO_STREAM(get_logger(), "Publish feedback");
        } 

        if (rclcpp::ok()) {
            goalHandle->succeed(result);
            RCLCPP_INFO_STREAM(get_logger(), "Goal success");
        }
    }

   private:
    rclcpp_action::Server<Exec>::SharedPtr m_server;
    rclcpp::CallbackGroup::SharedPtr m_cb;
};

int main(int argc, char** argv) {
    rclcpp::init(argc, argv);

    auto const server = std::make_shared<Server>();

    rclcpp::executors::MultiThreadedExecutor executor;
    executor.add_node(server);
    executor.spin();

    rclcpp::shutdown();
}

Expected behavior

Normal client/server action behavior where each client sends a goal and receives a result from the server.

Actual behavior

Node Server throws an exception and crashes

terminate called after throwing an instance of 'std::runtime_error'
  what():  Executing action server but nothing is ready

Additional information


Feature request

Feature description

Implementation considerations

fujitatomoya commented 1 year ago

@EvgeniNF thanks for creating issue.

i tried to use your sample here https://github.com/fujitatomoya/ros2_test_prover/commit/1938d8d0e2b8c49531ec6e30036e5e992845910c, i did modify several places to build it with rolling. so far i do not have the problem described here.

root@tomoyafujita:~/ros2_ws/colcon_ws# source install/local_setup.bash 
root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run prover_rclcpp rclcpp_2226 
[INFO] [1687800854.360839751] [action_server]: Goal handle
[INFO] [1687800854.361871421] [action_server]: Handle accepted
[INFO] [1687800854.361973245] [action_server]: Executing goal
[INFO] [1687800854.362307637] [action_server]: Publish feedback
[INFO] [1687800855.362249651] [action_server]: Publish feedback
[INFO] [1687800856.362233943] [action_server]: Publish feedback
[INFO] [1687800857.362178916] [action_server]: Publish feedback
[INFO] [1687800858.362253362] [action_server]: Publish feedback
[INFO] [1687800859.362238926] [action_server]: Publish feedback
[INFO] [1687800860.362247690] [action_server]: Publish feedback
[INFO] [1687800861.362286352] [action_server]: Publish feedback
[INFO] [1687800862.362336140] [action_server]: Publish feedback
^C[INFO] [1687800972.893649435] [rclcpp]: signal_handler(signum=2)

Ubuntu 20.04

This is not supported platform Iron, https://docs.ros.org/en/rolling/Releases/Release-Iron-Irwini.html#supported-platforms. can you try with Ubuntu 22.04?

EvgeniNF commented 1 year ago

@fujitatomoya Ok, thanks for reply, i try run this code on ubuntu 22.04

EvgeniNF commented 1 year ago

@fujitatomoya I have the same problem on ubuntu 22.04.02 (64-bit). I noticed what on Fast DDS this error occurs less frequently.

[INFO] [1687871330.810723412] [action_server]: Goal success
[INFO] [1687871330.907057276] [action_server]: Publish feedback
[INFO] [1687871331.007894935] [action_server]: Publish feedback
[INFO] [1687871331.108666245] [action_server]: Publish feedback
[INFO] [1687871331.209445981] [action_server]: Publish feedback
[INFO] [1687871331.211895488] [action_server]: Goal success
[INFO] [1687871331.270255007] [action_server]: Goal handle
[INFO] [1687871331.271282290] [action_server]: Handle accepted
[INFO] [1687871331.271716174] [action_server]: Start execute
[INFO] [1687871331.273183100] [action_server]: Goal handle
[INFO] [1687871331.274387011] [action_server]: Handle accepted
[INFO] [1687871331.274578657] [action_server]: Goal handle
[INFO] [1687871331.274843450] [action_server]: Start execute
[INFO] [1687871331.275820549] [action_server]: Handle accepted
[INFO] [1687871331.276691501] [action_server]: Start execute
terminate called after throwing an instance of 'std::runtime_error'
  what():  Executing action server but nothing is ready
[ros2run]: Aborted
EvgeniNF commented 1 year ago

I think it is race condition, because i catch this error on random request, but i don't understand when called method take_data(). Could you explain me how it's work, please. I want help close this problem

EvgeniNF commented 1 year ago

I noticed one more thing. When i am using single thread executor for spin services node, i don't catch error. Now i am testing this thing on ubuntu 20.04

fujitatomoya commented 1 year ago

with https://github.com/fujitatomoya/ros2_test_prover/commit/1938d8d0e2b8c49531ec6e30036e5e992845910c, and the following stress test for hour, i cannot reproduce this issue in my local environment.

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run prover_rclcpp rclcpp_2226 
[INFO] [1687988184.470012330] [action_server]: Goal handle
[INFO] [1687988184.470281262] [action_server]: Handle accepted
[INFO] [1687988184.470335165] [action_server]: Executing goal
[INFO] [1687988184.470407129] [action_server]: Publish feedback
[INFO] [1687988185.470686883] [action_server]: Publish feedback
[INFO] [1687988186.470699443] [action_server]: Publish feedback
[INFO] [1687988187.470627795] [action_server]: Publish feedback
[INFO] [1687988188.470713334] [action_server]: Publish feedback
...
root@tomoyafujita:~/ros2_ws/colcon_ws# while true; do ros2 action send_goal /test/server action_tutorials_interfaces/action/Fibonacci order:\ 10; done
Waiting for an action server to become available...
Sending goal:
     order: 10

Goal accepted with ID: d4cdf1b178ce4c06a55702b1e514d845
...
EvgeniNF commented 1 year ago

@fujitatomoya try to calling the action server with multiple clients in same time

EvgeniNF commented 1 year ago

@fujitatomoya and i noticed in test case you don't call goalHandle->succeed(result);

fujitatomoya commented 1 year ago

try to calling the action server with multiple clients in same time

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run prover_rclcpp rclcpp_2226
...

make sure we have more process spaces that can be ON_PROC to the server.

root@tomoyafujita:~/ros2_ws/colcon_ws# cat /proc/cpuinfo | grep processor | wc -l
16

root@tomoyafujita:~/ros2_ws/colcon_ws# cat test.sh
#!/bin/bash

for i in {1..32}
do
    echo action request sending $i ---
    ros2 action send_goal /test/server action_tutorials_interfaces/action/Fibonacci order:\ 10 &
done

root@tomoyafujita:~/ros2_ws/colcon_ws# ./test.sh
...<snip>

all 32 goals have been completed and confirmed.

btw, i had the following different error on the server side.

terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
  what():  client will not receive response, at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/rmw_response.cpp:149, at /root/ros2_ws/colcon_ws/src/ros2/rcl/rcl/src/rcl/service.c:379

this is know issue, https://github.com/ros2/ros2/issues/1253

EvgeniNF commented 1 year ago

@fujitatomoya Ok, i will try your test, maybe my client code is bad...

evgeniy@evgeniy-GL753VD:~$ cat /proc/cpuinfo | grep processor | wc -l 8

My client code:

#include <functional>
#include <future>
#include <memory>
#include <string>
#include <rclcpp/rclcpp.hpp>
#include <example_interfaces/srv/add_two_ints.hpp>

using Srv = example_interfaces::srv::AddTwoInts;
using Request = Srv::Request::SharedPtr;
using Responce = Srv::Response::SharedPtr;

class Client {
   public: 
    Client(rclcpp::Node::SharedPtr node) : m_node(std::move(node)) {
          if (not m_node->has_parameter("path")) {
             m_node->declare_parameter("path", "/test/service");
          }

          m_client = m_node->create_client<Srv>(
          m_node->get_parameter("path").as_string()
       );
    }
    Client(Client const &) = delete;
    Client(Client &&) noexcept = delete;
    Client& operator=(Client const &) = delete;
    Client& operator=(Client &&) noexcept = delete;
    ~Client() noexcept;

   public:
    std::shared_future<Responce> sendRequest(Request goal) {
      if (not m_client->wait_for_service(std::chrono::seconds(2))) {
          throw std::runtime_error("Service is not ready");
      }

      auto responce = m_client->async_send_request(goal);
      return responce;
    }

   private:
    rclcpp::Node::SharedPtr m_node;
    rclcpp::Client<Srv>::SharedPtr m_client;
};

int main(int argc, char** argv) {
    rclcpp::init(argc, argv);
    auto node = rclcpp::Node::make_shared("client");
    auto client1 = std::make_shared<Client>(node);
    auto client2 = std::make_shared<Client>(node);
    auto client3 = std::make_shared<Client>(node);

    while (rclcpp::ok()) {
        try {
            std::vector<std::shared_future<Responce>> futures;
            auto const req = std::make_shared<Srv::Request>();

            req->a = 1;
            req->b = 1;
            futures.push_back(client1->sendRequest(req));

            req->a = 2;
            req->b = 2;
            futures.push_back(client2->sendRequest(req));

            req->a = 3;
            req->b = 3;
            futures.push_back(client3->sendRequest(req));

            for (auto &future: futures) {
                rclcpp::spin_until_future_complete(node, future);
                auto result = future.get();

                if (not result) {
                    throw std::runtime_error("result is empty");
                }
            }

            RCLCPP_INFO_STREAM(rclcpp::get_logger("client"), "Result is success");
        } catch (std::exception const &error) {
            RCLCPP_ERROR_STREAM(rclcpp::get_logger("client"), error.what());
            std::this_thread::sleep_for(std::chrono::seconds(1));
        }
    }
    rclcpp::shutdown();    
}
EvgeniNF commented 1 year ago

@fujitatomoya With SingleThreadExecuter i don't have any problems on server. I tested it all day yesterday

EvgeniNF commented 1 year ago

@fujitatomoya About error: I am using CycloneDDS and i saw in this issue https://github.com/ros2/ros2/issues/1253 it happens if you use FastDDS.

jmachowinski commented 1 year ago

This bug and https://github.com/ros2/rclcpp/issues/2242 are related. https://github.com/ros2/rclcpp/pull/2250 should also fix this. @EvgeniNF can you try this out, you need to compile your own ros though.

tonynajjar commented 1 year ago

I also have this issue with this setup:

OS: ubuntu 22 Architecture: ARM DDS: Cyclone Executor: EventsExecutor (https://github.com/irobot-ros/events-executor because the rclcpp version is not available in humble)

I was able to catch this backtrace if it's useful for you

[behavior_server-5] terminate called after throwing an instance of 'std::runtime_error'
[behavior_server-5]   what():  Taking data from action server but nothing is ready
[behavior_server-5] Stack trace (most recent call last):
[behavior_server-5] #15   Object "/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1", at 0xffffffffffffffff, in 
[behavior_server-5] #14   Object "/code/ros2_ws/build/nav2_behaviors/behavior_server", at 0x559047856f, in _start
[behavior_server-5] #13   Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0x7f7ce574cb, in __libc_start_main
[behavior_server-5] #12   Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0x7f7ce573fb, in 
[behavior_server-5] #11   Source "/code/ros2_ws/src/navigation2/nav2_behaviors/src/main.cpp", line 30, in main [0x5590478753]
[behavior_server-5]          27:   auto executor = std::make_shared<rclcpp::executors::EventsExecutor>();
[behavior_server-5]          28: 
[behavior_server-5]          29:   executor->add_node(recoveries_node->get_node_base_interface());
[behavior_server-5]       >  30:   executor->spin();
[behavior_server-5]          31:   rclcpp::shutdown();
[behavior_server-5]          32: 
[behavior_server-5]          33:   return 0;
[behavior_server-5] #10   Object "/code/ros2_ws/install/irobot_events_executor/lib/libirobot_events_executor.so", at 0x7f7d4f7723, in rclcpp::executors::EventsExecutor::spin()
[behavior_server-5] #9    Object "/code/ros2_ws/install/irobot_events_executor/lib/libirobot_events_executor.so", at 0x7f7d4f6e37, in rclcpp::executors::EventsExecutor::execute_event(rclcpp::executors::ExecutorEvent const&)
[behavior_server-5] #8    Object "/opt/ros/humble/lib/librclcpp_action.so", at 0x7f7c1d9dab, in rclcpp_action::ServerBase::take_data_by_entity_id(unsigned long)
[behavior_server-5] #7    Object "/opt/ros/humble/lib/librclcpp_action.so", at 0x7f7c1dcef3, in rclcpp_action::ServerBase::take_data()
[behavior_server-5] #6    Object "/usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.30", at 0x7f7d0b2d23, in __cxa_throw
[behavior_server-5] #5    Object "/usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.30", at 0x7f7d0b2a3f, in std::terminate()
[behavior_server-5] #4    Object "/usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.30", at 0x7f7d0b29db, in 
[behavior_server-5] #3    Object "/usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.30", at 0x7f7d0b51fb, in __gnu_cxx::__verbose_terminate_handler()
[behavior_server-5] #2    Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0x7f7ce5712f, in abort
[behavior_server-5] #1    Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0x7f7ce6a67b, in raise
[behavior_server-5] #0    Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0x7f7ceaf200, in 
[behavior_server-5] Aborted (Signal sent by tkill() 116 0)

I will try to see if I can reproduce with the SingleThreadedExecutor.

EvgeniNF commented 1 year ago

@tonynajjar If you will use SingleThreadedExecutor or callback group for sync action service this error don't happen. i solved it with Mutually Exclusive Callback Group

tonynajjar commented 1 year ago

i solved it with Mutually Exclusive Callback Group

How did you solve it?

EvgeniNF commented 1 year ago

@tonynajjar O sorry, simply i use for action server Mutually Exclusive Callback Group.

m_cb = create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive);
        m_server = rclcpp_action::create_server<Exec>(
            this,
            "/test/server",
            std::bind(&Server::goalHandle, this, _1, _2),
            std::bind(&Server::handleCancel, this, _1),
            std::bind(&Server::handleAccepted, this, _1),
            rcl_action_server_get_default_options(),
            m_cb
        );
jmachowinski commented 3 months ago

This should be fixed by the merge of https://github.com/ros2/rclcpp/pull/2531. Can anyone affected by this bug please check and confirm ?