robotpy / robotpy-cscore

Moved to https://github.com/robotpy/mostrobotpy
Other
17 stars 12 forks source link

SIGSEGV on exit #52

Closed virtuald closed 5 years ago

virtuald commented 5 years ago

Currently we release the GIL when calling CS_Shutdown, and sometimes this happens:

(gdb) bt
#0  0x00007ffff7d14232 in  () at /lib64/libpython3.7m.so.1.0
#1  0x00007ffff7da637e in  () at /lib64/libpython3.7m.so.1.0
#2  0x00007fffea629237 in std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}> const&, std::_Manager_operation) (__dest=..., __source=..., __op=4294967293)
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/pytypes.h:165
#3  0x00007fffea61cd0c in std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}> const&, std::_Manager_operation)
    (__dest=..., __source=..., __op=4294967293) at /usr/include/c++/8/bits/std_function.h:257
#4  0x00007fffea630287 in cs::Notifier::Thread::__dt_base() () at /usr/include/c++/8/bits/std_function.h:257
#5  0x00007fffea627957 in std::_Sp_counted_base::_M_release (this=0x7fffd8115f60)
    at /usr/include/c++/8/bits/shared_ptr_base.h:155
#6  0x00007fffea627957 in std::_Sp_counted_base::_M_release() (this=0x7fffd8115f60)
    at /usr/include/c++/8/bits/shared_ptr_base.h:148
#7  0x00007fffea641d6d in wpi::detail::SafeThreadOwnerBase::Stop() (this=0x555555a254f8)
    at /usr/include/c++/8/bits/shared_ptr_base.h:706
#8  0x00007fffea6b5bde in __lambda78::_FUN(void*) ()
    at cscore_src/cscore/src/main/native/cpp/Notifier.cpp:100
#9  0x00007ffff7cf5414 in  () at /lib64/libpython3.7m.so.1.0
#10 0x00007ffff7d13f1f in  () at /lib64/libpython3.7m.so.1.0
#11 0x00007ffff7da6770 in  () at /lib64/libpython3.7m.so.1.0
#12 0x00007ffff7d6cd67 in PyDict_SetItem () at /lib64/libpython3.7m.so.1.0
#13 0x00007ffff7daa50e in _PyModule_ClearDict () at /lib64/libpython3.7m.so.1.0
#14 0x00007ffff7df0d09 in PyImport_Cleanup () at /lib64/libpython3.7m.so.1.0
#15 0x00007ffff7e57f68 in Py_FinalizeEx () at /lib64/libpython3.7m.so.1.0
#16 0x00007ffff7e5a604 in  () at /lib64/libpython3.7m.so.1.0
#17 0x00007ffff7e5abdc in _Py_UnixMain () at /lib64/libpython3.7m.so.1.0
#18 0x00007ffff78c8413 in __libc_start_main () at /lib64/libc.so.6
#19 0x000055555555508e in _start ()

The error is on a Py_XDECREF, so that's probably related to releasing the GIL?

I release the GIL, and it seems that pybind11 has wound itself up into the std::function pointers, so cscore would need to clear them on exit.

(gdb) bt
#0  0x00007ffff7d14232 in  () at /lib64/libpython3.7m.so.1.0
#1  0x00007ffff7da637e in  () at /lib64/libpython3.7m.so.1.0
#2  0x00007fffea629087 in std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}> const&, std::_Manager_operation) (__dest=..., __source=..., __op=4294967293)
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/pytypes.h:165
#3  0x00007fffea61ccbc in std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}> const&, std::_Manager_operation)
    (__dest=..., __source=..., __op=4294967293) at /usr/include/c++/8/bits/std_function.h:257
#4  0x00007fffea630277 in cs::Notifier::Thread::__dt_base() () at /usr/include/c++/8/bits/std_function.h:257
#5  0x00007fffea63dd88 in std::thread::_State_impl::__dt_base ()
    at /usr/include/c++/8/bits/shared_ptr_base.h:155
#6  0x00007fffea63dd88 in std::thread::_State_impl::__dt_del() () at /usr/include/c++/8/thread:188
#7  0x00007fffe60e694c in  () at /lib64/libstdc++.so.6
#8  0x00007ffff7c0158e in start_thread () at /lib64/libpthread.so.0
#9  0x00007ffff79a16a3 in clone () at /lib64/libc.so.6
virtuald commented 5 years ago

Fixed the Stop function to cleanup after itself, but here's another weird stack trace (with optimizations disabled):

(gdb) bt
#0  0x00007ffff7d14232 in  () at /lib64/libpython3.7m.so.1.0
#1  0x00007ffff7da637e in  () at /lib64/libpython3.7m.so.1.0
#2  0x00007fffea39d1af in pybind11::handle::dec_ref() const & (this=<optimized out>)
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/pytypes.h:165
#3  0x00007fffea39d274 in pybind11::object::~object() (this=<optimized out>)
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/pytypes.h:208
#4  0x00007fffea3a6798 in pybind11::function::~function() (this=<optimized out>)
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/pytypes.h:1212
#5  0x00007fffea469f4e in pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}::~handle() ()
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/functional.h:57
#6  0x00007fffea46f002 in std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}>::_M_destroy(std::_Any_data&, std::integral_constant<bool, false>) (__victim=...)
    at /usr/include/c++/8/bits/std_function.h:188
#7  0x00007fffea46e97f in std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}> const&, std::_Manager_operation) (__dest=..., __source=..., __op=<optimized out>) at /usr/include/c++/8/bits/std_function.h:212
#8  0x00007fffea3ab5fd in std::_Function_base::~_Function_base() (this=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:257
#9  0x00007fffea3adfca in std::function<void (cs::VideoEvent const&)>::~function() (this=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:370
#10 0x00007fffea3adfe4 in cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}::~RawEvent() ()
    at cscore_src/cscore/src/main/native/include/cscore_oo.inl:617
#11 0x00007fffea3ea6a1 in std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}>::_M_destroy(std::_Any_data&, std::integral_constant<bool, false>) (__victim=...) at /usr/include/c++/8/bits/std_function.h:188
#12 0x00007fffea3d064d in std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}> const&, std::_Manager_operation)
    (__dest=..., __source=..., __op=<optimized out>) at /usr/include/c++/8/bits/std_function.h:212
#13 0x00007fffea3ab5fd in std::_Function_base::~_Function_base() (this=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:257
#14 0x00007fffea3ae024 in std::function<void (cs::RawEvent const&)>::~function() (this=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:370
#15 0x00007fffea482314 in cs::Notifier::Thread::Main() (this=<optimized out>)
    at cscore_src/cscore/src/main/native/cpp/Notifier.cpp:143
#16 0x00007fffea4ed848 in wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThread>)::{lambda()#1}::operator()() const () at cscore_src/wpiutil/src/main/native/cpp/SafeThread.cpp:34
#17 0x00007fffea4ee74a in std::__invoke_impl<void, wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThread>)::<lambda()> >(void) (__f=...) at /usr/include/c++/8/bits/invoke.h:60
#18 0x00007fffea4ee0b8 in std::__invoke<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThread>)::<lambda()> >(void) (__fn=...) at /usr/include/c++/8/bits/invoke.h:95
#19 0x00007fffea4eeef6 in std::_M_invoke<0>() (this=<optimized out>) at /usr/include/c++/8/thread:244
#20 0x00007fffea4eeeb7 in std::operator()() (this=<optimized out>) at /usr/include/c++/8/thread:253
#21 0x00007fffea4eee8e in std::_M_run() (this=<optimized out>) at /usr/include/c++/8/thread:196
#22 0x00007fffe5e6f943 in  () at /lib64/libstdc++.so.6
#23 0x00007ffff7c0158e in start_thread () at /lib64/libpthread.so.0
#24 0x00007ffff79a16a3 in clone () at /lib64/libc.so.6
virtuald commented 5 years ago

Hm, this seems vaguely related as well: https://github.com/pybind/pybind11/pull/1595

virtuald commented 5 years ago

Ok, so the problem is almost certainly related to the way that WPI::SafeThread destruction occurs and how pybind11 expects destruction to occur.