Open EvgeniNF opened 1 year ago
@EvgeniNF thanks for creating issue.
i tried to use your sample here https://github.com/fujitatomoya/ros2_test_prover/commit/1938d8d0e2b8c49531ec6e30036e5e992845910c, i did modify several places to build it with rolling. so far i do not have the problem described here.
root@tomoyafujita:~/ros2_ws/colcon_ws# source install/local_setup.bash
root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run prover_rclcpp rclcpp_2226
[INFO] [1687800854.360839751] [action_server]: Goal handle
[INFO] [1687800854.361871421] [action_server]: Handle accepted
[INFO] [1687800854.361973245] [action_server]: Executing goal
[INFO] [1687800854.362307637] [action_server]: Publish feedback
[INFO] [1687800855.362249651] [action_server]: Publish feedback
[INFO] [1687800856.362233943] [action_server]: Publish feedback
[INFO] [1687800857.362178916] [action_server]: Publish feedback
[INFO] [1687800858.362253362] [action_server]: Publish feedback
[INFO] [1687800859.362238926] [action_server]: Publish feedback
[INFO] [1687800860.362247690] [action_server]: Publish feedback
[INFO] [1687800861.362286352] [action_server]: Publish feedback
[INFO] [1687800862.362336140] [action_server]: Publish feedback
^C[INFO] [1687800972.893649435] [rclcpp]: signal_handler(signum=2)
Ubuntu 20.04
This is not supported platform Iron, https://docs.ros.org/en/rolling/Releases/Release-Iron-Irwini.html#supported-platforms. can you try with Ubuntu 22.04?
@fujitatomoya Ok, thanks for reply, i try run this code on ubuntu 22.04
@fujitatomoya I have the same problem on ubuntu 22.04.02 (64-bit). I noticed what on Fast DDS this error occurs less frequently.
[INFO] [1687871330.810723412] [action_server]: Goal success
[INFO] [1687871330.907057276] [action_server]: Publish feedback
[INFO] [1687871331.007894935] [action_server]: Publish feedback
[INFO] [1687871331.108666245] [action_server]: Publish feedback
[INFO] [1687871331.209445981] [action_server]: Publish feedback
[INFO] [1687871331.211895488] [action_server]: Goal success
[INFO] [1687871331.270255007] [action_server]: Goal handle
[INFO] [1687871331.271282290] [action_server]: Handle accepted
[INFO] [1687871331.271716174] [action_server]: Start execute
[INFO] [1687871331.273183100] [action_server]: Goal handle
[INFO] [1687871331.274387011] [action_server]: Handle accepted
[INFO] [1687871331.274578657] [action_server]: Goal handle
[INFO] [1687871331.274843450] [action_server]: Start execute
[INFO] [1687871331.275820549] [action_server]: Handle accepted
[INFO] [1687871331.276691501] [action_server]: Start execute
terminate called after throwing an instance of 'std::runtime_error'
what(): Executing action server but nothing is ready
[ros2run]: Aborted
I think it is race condition, because i catch this error on random request, but i don't understand when called method take_data(). Could you explain me how it's work, please. I want help close this problem
I noticed one more thing. When i am using single thread executor for spin services node, i don't catch error. Now i am testing this thing on ubuntu 20.04
with https://github.com/fujitatomoya/ros2_test_prover/commit/1938d8d0e2b8c49531ec6e30036e5e992845910c, and the following stress test for hour, i cannot reproduce this issue in my local environment.
root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run prover_rclcpp rclcpp_2226
[INFO] [1687988184.470012330] [action_server]: Goal handle
[INFO] [1687988184.470281262] [action_server]: Handle accepted
[INFO] [1687988184.470335165] [action_server]: Executing goal
[INFO] [1687988184.470407129] [action_server]: Publish feedback
[INFO] [1687988185.470686883] [action_server]: Publish feedback
[INFO] [1687988186.470699443] [action_server]: Publish feedback
[INFO] [1687988187.470627795] [action_server]: Publish feedback
[INFO] [1687988188.470713334] [action_server]: Publish feedback
...
root@tomoyafujita:~/ros2_ws/colcon_ws# while true; do ros2 action send_goal /test/server action_tutorials_interfaces/action/Fibonacci order:\ 10; done
Waiting for an action server to become available...
Sending goal:
order: 10
Goal accepted with ID: d4cdf1b178ce4c06a55702b1e514d845
...
@fujitatomoya try to calling the action server with multiple clients in same time
@fujitatomoya and i noticed in test case you don't call goalHandle->succeed(result);
try to calling the action server with multiple clients in same time
root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run prover_rclcpp rclcpp_2226
...
make sure we have more process spaces that can be ON_PROC to the server.
root@tomoyafujita:~/ros2_ws/colcon_ws# cat /proc/cpuinfo | grep processor | wc -l
16
root@tomoyafujita:~/ros2_ws/colcon_ws# cat test.sh
#!/bin/bash
for i in {1..32}
do
echo action request sending $i ---
ros2 action send_goal /test/server action_tutorials_interfaces/action/Fibonacci order:\ 10 &
done
root@tomoyafujita:~/ros2_ws/colcon_ws# ./test.sh
...<snip>
all 32 goals have been completed and confirmed.
btw, i had the following different error on the server side.
terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
what(): client will not receive response, at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/rmw_response.cpp:149, at /root/ros2_ws/colcon_ws/src/ros2/rcl/rcl/src/rcl/service.c:379
this is know issue, https://github.com/ros2/ros2/issues/1253
@fujitatomoya Ok, i will try your test, maybe my client code is bad...
evgeniy@evgeniy-GL753VD:~$ cat /proc/cpuinfo | grep processor | wc -l 8
My client code:
#include <functional>
#include <future>
#include <memory>
#include <string>
#include <rclcpp/rclcpp.hpp>
#include <example_interfaces/srv/add_two_ints.hpp>
using Srv = example_interfaces::srv::AddTwoInts;
using Request = Srv::Request::SharedPtr;
using Responce = Srv::Response::SharedPtr;
class Client {
public:
Client(rclcpp::Node::SharedPtr node) : m_node(std::move(node)) {
if (not m_node->has_parameter("path")) {
m_node->declare_parameter("path", "/test/service");
}
m_client = m_node->create_client<Srv>(
m_node->get_parameter("path").as_string()
);
}
Client(Client const &) = delete;
Client(Client &&) noexcept = delete;
Client& operator=(Client const &) = delete;
Client& operator=(Client &&) noexcept = delete;
~Client() noexcept;
public:
std::shared_future<Responce> sendRequest(Request goal) {
if (not m_client->wait_for_service(std::chrono::seconds(2))) {
throw std::runtime_error("Service is not ready");
}
auto responce = m_client->async_send_request(goal);
return responce;
}
private:
rclcpp::Node::SharedPtr m_node;
rclcpp::Client<Srv>::SharedPtr m_client;
};
int main(int argc, char** argv) {
rclcpp::init(argc, argv);
auto node = rclcpp::Node::make_shared("client");
auto client1 = std::make_shared<Client>(node);
auto client2 = std::make_shared<Client>(node);
auto client3 = std::make_shared<Client>(node);
while (rclcpp::ok()) {
try {
std::vector<std::shared_future<Responce>> futures;
auto const req = std::make_shared<Srv::Request>();
req->a = 1;
req->b = 1;
futures.push_back(client1->sendRequest(req));
req->a = 2;
req->b = 2;
futures.push_back(client2->sendRequest(req));
req->a = 3;
req->b = 3;
futures.push_back(client3->sendRequest(req));
for (auto &future: futures) {
rclcpp::spin_until_future_complete(node, future);
auto result = future.get();
if (not result) {
throw std::runtime_error("result is empty");
}
}
RCLCPP_INFO_STREAM(rclcpp::get_logger("client"), "Result is success");
} catch (std::exception const &error) {
RCLCPP_ERROR_STREAM(rclcpp::get_logger("client"), error.what());
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
rclcpp::shutdown();
}
@fujitatomoya With SingleThreadExecuter i don't have any problems on server. I tested it all day yesterday
@fujitatomoya About error: I am using CycloneDDS and i saw in this issue https://github.com/ros2/ros2/issues/1253 it happens if you use FastDDS.
This bug and https://github.com/ros2/rclcpp/issues/2242 are related. https://github.com/ros2/rclcpp/pull/2250 should also fix this. @EvgeniNF can you try this out, you need to compile your own ros though.
I also have this issue with this setup:
OS: ubuntu 22 Architecture: ARM DDS: Cyclone Executor: EventsExecutor (https://github.com/irobot-ros/events-executor because the rclcpp version is not available in humble)
I was able to catch this backtrace if it's useful for you
[behavior_server-5] terminate called after throwing an instance of 'std::runtime_error'
[behavior_server-5] what(): Taking data from action server but nothing is ready
[behavior_server-5] Stack trace (most recent call last):
[behavior_server-5] #15 Object "/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1", at 0xffffffffffffffff, in
[behavior_server-5] #14 Object "/code/ros2_ws/build/nav2_behaviors/behavior_server", at 0x559047856f, in _start
[behavior_server-5] #13 Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0x7f7ce574cb, in __libc_start_main
[behavior_server-5] #12 Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0x7f7ce573fb, in
[behavior_server-5] #11 Source "/code/ros2_ws/src/navigation2/nav2_behaviors/src/main.cpp", line 30, in main [0x5590478753]
[behavior_server-5] 27: auto executor = std::make_shared<rclcpp::executors::EventsExecutor>();
[behavior_server-5] 28:
[behavior_server-5] 29: executor->add_node(recoveries_node->get_node_base_interface());
[behavior_server-5] > 30: executor->spin();
[behavior_server-5] 31: rclcpp::shutdown();
[behavior_server-5] 32:
[behavior_server-5] 33: return 0;
[behavior_server-5] #10 Object "/code/ros2_ws/install/irobot_events_executor/lib/libirobot_events_executor.so", at 0x7f7d4f7723, in rclcpp::executors::EventsExecutor::spin()
[behavior_server-5] #9 Object "/code/ros2_ws/install/irobot_events_executor/lib/libirobot_events_executor.so", at 0x7f7d4f6e37, in rclcpp::executors::EventsExecutor::execute_event(rclcpp::executors::ExecutorEvent const&)
[behavior_server-5] #8 Object "/opt/ros/humble/lib/librclcpp_action.so", at 0x7f7c1d9dab, in rclcpp_action::ServerBase::take_data_by_entity_id(unsigned long)
[behavior_server-5] #7 Object "/opt/ros/humble/lib/librclcpp_action.so", at 0x7f7c1dcef3, in rclcpp_action::ServerBase::take_data()
[behavior_server-5] #6 Object "/usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.30", at 0x7f7d0b2d23, in __cxa_throw
[behavior_server-5] #5 Object "/usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.30", at 0x7f7d0b2a3f, in std::terminate()
[behavior_server-5] #4 Object "/usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.30", at 0x7f7d0b29db, in
[behavior_server-5] #3 Object "/usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.30", at 0x7f7d0b51fb, in __gnu_cxx::__verbose_terminate_handler()
[behavior_server-5] #2 Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0x7f7ce5712f, in abort
[behavior_server-5] #1 Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0x7f7ce6a67b, in raise
[behavior_server-5] #0 Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0x7f7ceaf200, in
[behavior_server-5] Aborted (Signal sent by tkill() 116 0)
I will try to see if I can reproduce with the SingleThreadedExecutor.
@tonynajjar If you will use SingleThreadedExecutor or callback group for sync action service this error don't happen. i solved it with Mutually Exclusive Callback Group
i solved it with Mutually Exclusive Callback Group
How did you solve it?
@tonynajjar O sorry, simply i use for action server Mutually Exclusive Callback Group.
m_cb = create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive);
m_server = rclcpp_action::create_server<Exec>(
this,
"/test/server",
std::bind(&Server::goalHandle, this, _1, _2),
std::bind(&Server::handleCancel, this, _1),
std::bind(&Server::handleAccepted, this, _1),
rcl_action_server_get_default_options(),
m_cb
);
This should be fixed by the merge of https://github.com/ros2/rclcpp/pull/2531. Can anyone affected by this bug please check and confirm ?
Bug report
Required Info:
Steps to reproduce issue
I am do stress test for action server and catch this error on random request. How i can fix that?
Expected behavior
Normal client/server action behavior where each client sends a goal and receives a result from the server.
Actual behavior
Node Server throws an exception and crashes
Additional information
Feature request
Feature description
Implementation considerations