ros-industrial / ros_canopen

CANopen driver framework for ROS (http://wiki.ros.org/ros_canopen)
GNU Lesser General Public License v3.0
340 stars 272 forks source link

occasional segfaults during shutdown #441

Open bochen87 opened 3 years ago

bochen87 commented 3 years ago

Hi @ipa-mdl , we're having occasional segfaults when shutting down which leads to our tests occasionally failing. I was wondering, is 1acd49381d114f4444122732ba53370af621761b trying to tackle that issue?

In our dtor, it's simply just this piece of code, where candriver is an object of type can::ThreadedSocketCANInterface

try
  {
    can_driver_.shutdown();
  }
  catch (const std::exception& e)
  {
    ROS_ERROR_STREAM("An exception occured while trying to shutdown the CAN driver");
    ROS_ERROR_STREAM(e.what());
  }
bochen87 commented 3 years ago

Here a stacktrace i was able to capture, we've since upgraded to the latest released debian package version, but it is still occasionally happening:

#0  0x0000000000000000 in ?? ()
#1  0x0000555555601813 in boost::asio::detail::scheduler_operation::destroy (this=0x5555556d2260) at /usr/include/boost/asio/detail/scheduler_operation.hpp:45
#2  0x000055555561fd37 in boost::asio::detail::op_queue_access::destroy<boost::asio::detail::scheduler_operation> (o=0x5555556d2260) at /usr/include/boost/asio/detail/op_queue.hpp:47
#3  0x0000555555616184 in boost::asio::detail::op_queue<boost::asio::detail::scheduler_operation>::~op_queue (this=0x5555556d2288, __in_chrg=<optimized out>) at /usr/include/boost/asio/detail/op_queue.hpp:81
#4  0x000055555560963d in boost::asio::detail::scheduler::~scheduler (this=0x5555556d21b0, __in_chrg=<optimized out>) at /usr/include/boost/asio/detail/impl/scheduler.ipp:135
#5  0x000055555560970c in boost::asio::detail::scheduler::~scheduler (this=0x5555556d21b0, __in_chrg=<optimized out>) at /usr/include/boost/asio/detail/impl/scheduler.ipp:142
#6  0x0000555555602ec6 in boost::asio::detail::service_registry::destroy (service=0x5555556d21b0) at /usr/include/boost/asio/detail/impl/service_registry.ipp:110
#7  0x0000555555602d35 in boost::asio::detail::service_registry::destroy_services (this=0x5555556d3360) at /usr/include/boost/asio/detail/impl/service_registry.ipp:54
#8  0x00005555556036c1 in boost::asio::execution_context::destroy (this=0x7fffffffccd8) at /usr/include/boost/asio/impl/execution_context.ipp:46
#9  0x00005555556035bb in boost::asio::execution_context::~execution_context (this=0x7fffffffccd8, __in_chrg=<optimized out>) at /usr/include/boost/asio/impl/execution_context.ipp:35
#10 0x000055555560b67a in boost::asio::io_context::~io_context (this=0x7fffffffccd8, __in_chrg=<optimized out>) at /usr/include/boost/asio/impl/io_context.ipp:56
#11 0x0000555555618bc7 in can::AsioDriver<boost::asio::posix::basic_stream_descriptor<boost::asio::executor> >::~AsioDriver (this=0x7fffffffcbb0, __in_chrg=<optimized out>) at ros_canopen/socketcan_interface/include/socketcan_interface/asio_base.h:81
#12 0x000055555561a1cf in can::SocketCANInterface::~SocketCANInterface (this=0x7fffffffcbb0, __in_chrg=<optimized out>) at ros_canopen/socketcan_interface/include/socketcan_interface/socketcan.h:20
#13 0x000055555561a3e5 in can::ThreadedInterface<can::SocketCANInterface>::~ThreadedInterface (this=0x7fffffffcbb0, __in_chrg=<optimized out>) at ros_canopen/socketcan_interface/include/socketcan_interface/threading.h:68
#14 0x0000555555613876 in CANSender::~CANSender (this=0x7fffffffcbb0, __in_chrg=<optimized out>) at test_receiver.cpp:39
#15 0x00005555555f6a8c in OpModeReceiver_invalidMode_Test::TestBody (this=0x5555556d2120) at test_receiver.cpp:77
#16 0x00007ffff7c59a99 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x5555556d2120, method=&virtual testing::Test::TestBody(), location=0x7ffff7c6edab "the test body") at /usr/src/googletest/googletest/src/gtest.cc:2433
#17 0x00007ffff7c521b1 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x5555556d2120, method=&virtual testing::Test::TestBody(), location=0x7ffff7c6edab "the test body") at /usr/src/googletest/googletest/src/gtest.cc:2469
#18 0x00007ffff7c2c556 in testing::Test::Run (this=0x5555556d2120) at /usr/src/googletest/googletest/src/gtest.cc:2508
#19 0x00007ffff7c2cf41 in testing::TestInfo::Run (this=0x5555556d1600) at /usr/src/googletest/googletest/src/gtest.cc:2684
#20 0x00007ffff7c2d699 in testing::TestSuite::Run (this=0x5555556d1450) at /usr/src/googletest/googletest/src/gtest.cc:2816
#21 0x00007ffff7c39843 in testing::internal::UnitTestImpl::RunAllTests (this=0x5555556d0a30) at /usr/src/googletest/googletest/src/gtest.cc:5338
#22 0x00007ffff7c5afc2 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x5555556d0a30, method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x7ffff7c39426 <testing::internal::UnitTestImpl::RunAllTests()>,
    location=0x7ffff7c6f7e8 "auxiliary test code (environments or event listeners)") at /usr/src/googletest/googletest/src/gtest.cc:2433
#23 0x00007ffff7c533ef in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x5555556d0a30, method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x7ffff7c39426 <testing::internal::UnitTestImpl::RunAllTests()>,
    location=0x7ffff7c6f7e8 "auxiliary test code (environments or event listeners)") at /usr/src/googletest/googletest/src/gtest.cc:2469
#24 0x00007ffff7c3804d in testing::UnitTest::Run (this=0x7ffff7c995e0 <testing::UnitTest::GetInstance()::instance>) at /usr/src/googletest/googletest/src/gtest.cc:4925
#25 0x00007ffff7f4d29a in RUN_ALL_TESTS () at /usr/src/googletest/googletest/include/gtest/gtest.h:2473
#26 0x00007ffff7f4d21c in main (argc=1, argv=0x7fffffffd508) at /usr/src/googletest/googletest/src/gtest_main.cc:45
#27 0x00007ffff76850b3 in __libc_start_main (main=0x7ffff7f4d1d9 <main(int, char**)>, argc=2, argv=0x7fffffffd508, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd4f8) at ../csu/libc-start.c:308
#28 0x00005555555f571e in _start ()
mathias-luedtke commented 3 years ago

Thanks for reporting!

Are you calling shutdown explicitly in your destructor? (even without the try/catch block).

https://github.com/ros-industrial/ros_canopen/commit/1acd49381d114f4444122732ba53370af621761b should have improved the shutdown behavior, but it mostly was about not call virtual member in the destructor..

bochen87 commented 3 years ago

Thanks for reporting!

Are you calling shutdown explicitly in your destructor? (even without the try/catch block).

https://github.com/ros-industrial/ros_canopen/commit/1acd49381d114f4444122732ba53370af621761b should have improved the shutdown behavior, but it mostly was about not call virtual member in the destructor..

We are calling it in the destructor exactly the way I have copied the code snippet above with the try / catch block. But it segfaults every once in a while. My guess is there is some race in destructing threads and once in a while gtest is faster and that's why boost asio gets a segfault?