ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
514 stars 410 forks source link

[wjwwood] feat(MultiThreadedExecutor): Added ability to handle exceptions from threads #2505

Open wjwwood opened 2 months ago

wjwwood commented 2 months ago

Modified version of https://github.com/ros2/rclcpp/pull/2382

mergify[bot] commented 2 months ago

⚠️ The sha of the head commit of this PR conflicts with #2382. Mergify cannot evaluate rules on this PR. ⚠️

wjwwood commented 2 months ago

CI:

wjwwood commented 2 months ago

@jmachowinski unfortunately I did not get this one in under the wire, nightly CI has the farm locked up now (we should consider putting the deadline just before nightly jobs start...), but ultimately this set of changes were not ideal I think, maybe you had a big reason for passing it as an argument for spin(), but if you didn't I think the change to allow it in the executor options was probably a better approach that was less disruptive. The trade-off, however, is that it's up to each executor to make sure they account for the exception handler in the options, whereas the new signature in this approach makes that hard to ignore.

However, there still may be a path to the ExecutorOptions change in jazzy, but it will require an ABI change before the release, but one I think we really should make, which is to add a pimpl pointer in the Executor base class as well as one in the ExecutorOptions. That would allow us to extend the options and executor APIs without breaking ABI in existing releases if needed. I'll have to talk to the other maintainers about it, but maybe we can go that route.

jmachowinski commented 2 months ago

@wjwwood Thanks for the effort you put into the PRs !

Your latest changes will break debugging. As soon as you catch an exception, and rethrow it, the only backtrack you will get will point to the rethrow. That was the reason for the code duplication that was going on.

wjwwood commented 2 months ago

I don't fully understand what you mean, but I can have a look tomorrow. Should be something we can fix after the API freeze.

jmachowinski commented 2 months ago

I don't fully understand what you mean, but I can have a look tomorrow. Should be something we can fix after the API freeze.

#include <stdexcept>

void origin()
{
    throw std::runtime_error("Error in Origin");
}

void rethrow_fun()
{
    try
    {
        origin();
    }
    catch(const std::exception &e)
    {
        throw;
    }
}

int main()
{
    rethrow_fun();

    return 0;
}
(gdb) r
Starting program: /home/jm/ros2/ros2_ws/example/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error in Origin

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352586176) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352586176) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352586176) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352586176, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff78287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7ca2b9e in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7cae20c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff7cae277 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff7cae52b in __cxa_rethrow () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00005555555552c0 in rethrow_fun() ()
#10 0x00005555555552ea in main ()
(gdb) 
MultiThreadedExecutor::spin()
{
  spin([](const std::exception & e) {throw e;});
}

With the construct above, you will only get stack traces, pointing to the lambda inside of spin, but no one pointing to the origin of the throw. This makes debugging almost impossible.

But yes, changing this should have no effect on the external API/ABI

wjwwood commented 2 months ago

We decided to not break the API freeze for these prs, but I did get tentative approval to add a pimpl pointer to the Executor and ExecutorOptions classes (ABI but not API breaking), which should let us address this problem between now and the release or just after the release, using the pimpl and new APIs (which are allowed).

alsora commented 1 week ago

@wjwwood is there any update on this? It looks that we could just undo the changes to the spin function done in your last commit, and this should be good to go (like this https://github.com/ros2/rclcpp/pull/2505/commits/05a8b4cf09685b95744bd808acf39c6d280ea3f6#diff-fd2db21239de2963e2248c8d3ef4042ebd4174ef3e4f9dbd252a375303398796R30)