wpilibsuite / allwpilib

Official Repository of WPILibJ and WPILibC
https://wpilib.org/
Other
1.05k stars 611 forks source link

Occasional hang due to deadlock when HALSimWS connection is present while `SimDevice` is being created. #6842

Closed brettle closed 1 month ago

brettle commented 1 month ago

Describe the bug

If the HALSimWS server extension is enabled and a client connects connection is present while the robot is creating a SimDevice, a deadlock can occur.

To Reproduce

Due to the race condition involved, it might not be possible to reproduce this reliably, but see below for stack traces showing the deadlock.

Steps to reproduce the behavior:

  1. Enable the HALSimWS server extension.
  2. In the robot code, after a HALSimWS connection is present, create a SimDevice. (To increase the chances of triggering the issue, it might help to create many SimDevices and to create them after some delay to ensure that a HALSimWS client has connected.)
  3. While a SimDevice is being created, connect a HALSimWS client.

Expected behavior The code should continue executing normally (and the device should be created and be visible to the HALSimWS client).

Desktop (please complete the following information):

Additional context

After a deadlock occured, I attached GDB to the java process. Here is the stack trace for the thread attempting to create the SimDevice. It appears to be blocking waiting for the AsyncFunction mutex while running HALSimWS's createDevice callback. Note that at this point it is holding a lock on the SimDeviceData mutex that it acquired in SimDeviceData::CreateDevice

libc.so.6!futex_wait(unsigned int * futex_word, unsigned int expected, int private) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/sysdeps/nptl/futex-internal.h:146)
libc.so.6!__GI___lll_lock_wait(int * futex, int * futex@entry, int private) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/lowlevellock.c:49)
libc.so.6!lll_mutex_lock_optimized(pthread_mutex_t * mutex) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_mutex_lock.c:48)
libc.so.6!___pthread_mutex_lock(pthread_mutex_t * mutex) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_mutex_lock.c:93)
libhalsim_ws_server.so!__gthread_mutex_lock(__gthread_mutex_t * __mutex) (/usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749)
libhalsim_ws_server.so!std::mutex::lock(class std::mutex * const this) (/usr/include/c++/11/bits/std_mutex.h:100)
libhalsim_ws_server.so!std::scoped_lock<std::mutex>::scoped_lock(std::scoped_lock<std::mutex>::mutex_type & __m, class std::scoped_lock<std::mutex> * const this) (/usr/include/c++/11/mutex:655)
libhalsim_ws_server.so!wpi::uv::AsyncFunction<void(std::function<void()>)>::Call<wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(char const*, HAL_SimDeviceHandle)::<lambda()> >(class wpi::uv::AsyncFunction<void(std::function<void()>)> * const this) (/work/wpinet/src/main/native/include/wpinet/uv/AsyncFunction.h:145)
libhalsim_ws_server.so!wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(class wpilibws::HALSimWSProviderSimDevices * const this, const char * name, HAL_SimDeviceHandle handle) (/work/simulation/halsim_ws_core/src/main/native/cpp/WSProvider_SimDevice.cpp:277)
libwpiHal.so!hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)>::Invoke<int&>(const char * name, const class hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)> * const this) (/work/hal/src/main/native/sim/mockdata/SimDeviceDataInternal.h:123)
libwpiHal.so!hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)>::operator()<char const*&, int&>(const class hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)> * const this) (/work/hal/src/main/native/sim/mockdata/SimDeviceDataInternal.h:131)
libwpiHal.so!hal::SimDeviceData::CreateDevice(class hal::SimDeviceData * const this, const char * name) (/work/hal/src/main/native/sim/mockdata/SimDeviceData.cpp:114)
libwpiHaljni.so!Java_edu_wpi_first_hal_SimDeviceJNI_createSimDevice(JNIEnv * env, jstring name) (/work/wpiutil/src/main/native/thirdparty/llvm/include/wpi/SmallVector.h:273)
[Unknown/Just-In-Time compiled code] (Unknown Source:0)

The AsyncFunction mutex that the above thread is waiting for is locked by the HALSimWS server thread which is running the DeviceCreated callback it registered during it's initialization. It appears to be blocked waiting for the SimDeviceData mutex held by the earlier thread. Here is its stack trace:

libwpiHal.so!wpi::recursive_spinlock1::try_lock(wpi::recursive_spinlock1 * const this) (/work/wpiutil/src/main/native/include/wpi/spinlock.h:56)
libwpiHal.so!wpi::recursive_spinlock1::lock(wpi::recursive_spinlock1 * const this) (/work/wpiutil/src/main/native/include/wpi/spinlock.h:71)
libwpiHal.so!std::scoped_lock<wpi::recursive_spinlock1>::scoped_lock(std::scoped_lock<wpi::recursive_spinlock1>::mutex_type & __m, std::scoped_lock<wpi::recursive_spinlock1> * const this) (/usr/include/c++/11/mutex:655)
libwpiHal.so!hal::SimDeviceData::RegisterValueCreatedCallback(hal::SimDeviceData * const this, HAL_SimDeviceHandle device, void * param, HALSIM_SimValueCallback callback, bool initialNotify) (/work/hal/src/main/native/sim/mockdata/SimDeviceData.cpp:346)
libhalsim_ws_server.so!wpilibws::HALSimWSProviderSimDevice::OnNetworkConnected(wpilibws::HALSimWSProviderSimDevice * const this, std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection> ws) (/work/simulation/halsim_ws_core/src/main/native/cpp/WSProvider_SimDevice.cpp:37)
libhalsim_ws_server.so!operator()(const struct {...} * const __closure) (/work/simulation/halsim_ws_core/src/main/native/cpp/WSProvider_SimDevice.cpp:277)
libhalsim_ws_server.so!std::__invoke_impl<void, wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(char const*, HAL_SimDeviceHandle)::<lambda()>&>(struct {...} & __f) (/usr/include/c++/11/bits/invoke.h:61)
libhalsim_ws_server.so!std::__invoke_r<void, wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(char const*, HAL_SimDeviceHandle)::<lambda()>&>(struct {...} & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libhalsim_ws_server.so!std::_Function_handler<void(), wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(char const*, HAL_SimDeviceHandle)::<lambda()> >::_M_invoke(const std::_Any_data &)(const std::_Any_data & __functor) (/usr/include/c++/11/bits/std_function.h:290)
libhalsim_ws_server.so!std::function<void ()>::operator()() const(const std::function<void()> * const this) (/usr/include/c++/11/bits/std_function.h:590)
libhalsim_ws_server.so!operator()<wpi::promise<void> >(wpilibws::HALSimWSProviderSimDevices::LoopFn func, wpi::promise<void> out) (/work/simulation/halsim_ws_core/src/main/native/cpp/WSProvider_SimDevice.cpp:293)
libhalsim_ws_server.so!std::__invoke_impl<void, wpilibws::HALSimWSProviderSimDevices::Initialize(wpi::uv::Loop&)::<lambda(auto:31, wpilibws::HALSimWSProviderSimDevices::LoopFn)>&, wpi::promise<void>, std::function<void()> >(struct {...} & __f) (/usr/include/c++/11/bits/invoke.h:61)
libhalsim_ws_server.so!std::__invoke_r<void, wpilibws::HALSimWSProviderSimDevices::Initialize(wpi::uv::Loop&)::<lambda(auto:31, wpilibws::HALSimWSProviderSimDevices::LoopFn)>&, wpi::promise<void>, std::function<void()> >(struct {...} & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libhalsim_ws_server.so!std::_Function_handler<void(wpi::promise<void>, std::function<void()>), wpilibws::HALSimWSProviderSimDevices::Initialize(wpi::uv::Loop&)::<lambda(auto:31, wpilibws::HALSimWSProviderSimDevices::LoopFn)> >::_M_invoke(const std::_Any_data &, wpi::promise<void> &&, std::function<void()> &&)(const std::_Any_data & __functor, wpi::promise<void> && __args#0, std::function<void()> && __args#1) (/usr/include/c++/11/bits/std_function.h:290)
libhalsim_ws_server.so!std::function<void (wpi::promise<void>, std::function<void ()>)>::operator()(wpi::promise<void>, std::function<void ()>) const(std::function<void()> __args#1, wpi::promise<void> __args#0, const std::function<void(wpi::promise<void>, std::function<void()>)> * const this) (/usr/include/c++/11/bits/std_function.h:590)
libhalsim_ws_server.so!std::__invoke_impl<void, std::function<void (wpi::promise<void>, std::function<void ()>)>&, wpi::promise<void>, std::function<void ()> >(std::__invoke_other, std::function<void (wpi::promise<void>, std::function<void ()>)>&, wpi::promise<void>&&, std::function<void ()>&&)(std::function<void(wpi::promise<void>, std::function<void()>)> & __f) (/usr/include/c++/11/bits/invoke.h:61)
libhalsim_ws_server.so!std::__invoke<std::function<void (wpi::promise<void>, std::function<void ()>)>&, wpi::promise<void>, std::function<void ()> >(std::function<void (wpi::promise<void>, std::function<void ()>)>&, wpi::promise<void>&&, std::function<void ()>&&)(std::function<void(wpi::promise<void>, std::function<void()>)> & __fn) (/usr/include/c++/11/bits/invoke.h:96)
libhalsim_ws_server.so!std::__apply_impl<std::function<void (wpi::promise<void>, std::function<void ()>)>&, std::tuple<wpi::promise<void>, std::function<void ()> >, 0ul, 1ul>(std::function<void (wpi::promise<void>, std::function<void ()>)>&, std::tuple<wpi::promise<void>, std::function<void ()> >&&, std::integer_sequence<unsigned long, 0ul, 1ul>)(std::tuple<wpi::promise<void>, std::function<void()> > && __t, std::function<void(wpi::promise<void>, std::function<void()>)> & __f) (/usr/include/c++/11/tuple:1854)
libhalsim_ws_server.so!std::apply<std::function<void (wpi::promise<void>, std::function<void ()>)>&, std::tuple<wpi::promise<void>, std::function<void ()> > >(std::function<void (wpi::promise<void>, std::function<void ()>)>&, std::tuple<wpi::promise<void>, std::function<void ()> >&&)(std::tuple<wpi::promise<void>, std::function<void()> > && __t, std::function<void(wpi::promise<void>, std::function<void()>)> & __f) (/usr/include/c++/11/tuple:1865)
libhalsim_ws_server.so!wpi::uv::AsyncFunction<void (std::function<void ()>)>::Create(std::shared_ptr<wpi::uv::Loop> const&, std::function<void (wpi::promise<void>, std::function<void ()>)>)::{lambda(uv_async_s*)#1}::operator()(uv_async_s*) const(uv_async_t * handle) (/work/wpinet/src/main/native/include/wpinet/uv/AsyncFunction.h:95)
libhalsim_ws_server.so!wpi::uv::AsyncFunction<void (std::function<void ()>)>::Create(std::shared_ptr<wpi::uv::Loop> const&, std::function<void (wpi::promise<void>, std::function<void ()>)>)::{lambda(uv_async_s*)#1}::_FUN(uv_async_s*)() (/work/wpinet/src/main/native/include/wpinet/uv/AsyncFunction.h:84)
libwpinet.so!uv__async_io(uv_loop_t * loop, uv__io_t * w, unsigned int events) (/work/wpinet/src/main/native/thirdparty/libuv/src/unix/async.cpp:177)
libwpinet.so!uv__io_poll(uv_loop_t * loop, uv_loop_t * loop@entry, int timeout) (/work/wpinet/src/main/native/thirdparty/libuv/src/unix/linux.cpp:1527)
libwpinet.so!uv_run(uv_loop_t * loop, uv_run_mode mode, uv_run_mode mode@entry) (/work/wpinet/src/main/native/thirdparty/libuv/src/unix/core.cpp:448)
libwpinet.so!wpi::uv::Loop::Run(wpi::uv::Loop::Mode mode, wpi::uv::Loop * const this) (/work/wpinet/src/main/native/include/wpinet/uv/Loop.h:113)
libwpinet.so!wpi::EventLoopRunner::Thread::Main(wpi::EventLoopRunner::Thread * const this) (/work/wpinet/src/main/native/cpp/EventLoopRunner.cpp:36)
libwpiutil.so!operator()(const struct {...} * const __closure) (/work/wpiutil/src/main/native/cpp/SafeThread.cpp:79)
libwpiutil.so!std::__invoke_impl<void, wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> >(struct {...} && __f) (/usr/include/c++/11/bits/invoke.h:61)
libwpiutil.so!std::__invoke<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> >(struct {...} && __fn) (/usr/include/c++/11/bits/invoke.h:96)
libwpiutil.so!std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > >::_M_invoke<0>(std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > * const this) (/usr/include/c++/11/bits/std_thread.h:259)
libwpiutil.so!std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > >::operator()(std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > * const this) (/usr/include/c++/11/bits/std_thread.h:266)
libwpiutil.so!std::thread::_State_impl<std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > >::_M_run(void)(std::thread::_State_impl<std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > > * const this) (/usr/include/c++/11/bits/std_thread.h:211)
libstdc++.so.6!execute_native_thread_routine (Unknown Source:0)
libc.so.6!start_thread(void * arg) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_create.c:447)
libc.so.6!clone3() (/usr/src/debug/glibc-2.39-17.fc40.x86_64/sysdeps/unix/sysv/linux/x86_64/clone3.S:78)

In short, this appears to be an issue of lock ordering inversion.

As a side note: I'm probably missing something, but it doesn't seem like AsyncFunction should need it's own mutex if all of the calls are made from the same thread/loop. If they aren't, what is the reasoning behind that?

PeterJohnson commented 1 month ago

AsyncFunction needs a mutex because its entire purpose is to signal the loop thread from some other thread, and several member variables are modified on both caller thread(s) and the loop thread.

It does look like it should have a recursive_mutex to handle this case (or not hold the lock during the std::apply call, but that's trickier to get right).

brettle commented 1 month ago

A recursive_mutex would only help if there was only one thread involved in the deadlock, right? Don't the stack traces above I posted indicate that there are 2 threads involved?

PeterJohnson commented 1 month ago

Good point, yeah, it must be a lock inversion issue. In which case we need to release the lock when running the callback to prevent it (either in AsyncFunction or in SimDevice). It's a little unclear if the same AsyncFunction is being used in both threads, but that's the only thing that makes sense.

From the stack trace, it looks like thread A has the following locks held, trying to lock AsyncFunction.m_mutex: SimDeviceData.m_mutex

and Thread B has the following locks held, trying to lock SimDeviceData.m_mutex HALSimWSProviderSimDevice.m_ws AsyncFunction.m_mutex

brettle commented 1 month ago

Here's a different pair of stack traces for a different lock inversion deadlock created in the same way:

Thread calling SimDevice.create() is waiting on ProviderContainer mutex while holding SimDeviceData mutex.

libc.so.6!__futex_abstimed_wait_common(unsigned int * futex_word, unsigned int * futex_word@entry, unsigned int expected, unsigned int expected@entry, clockid_t clockid, clockid_t clockid@entry, const struct timespec * abstime, const struct timespec * abstime@entry, int private, int private@entry, _Bool cancel, _Bool cancel@entry) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/futex-internal.c:103)
libc.so.6!__GI___futex_abstimed_wait64(unsigned int * futex_word, unsigned int * futex_word@entry, unsigned int expected, unsigned int expected@entry, clockid_t clockid, clockid_t clockid@entry, const struct timespec * abstime, const struct timespec * abstime@entry, int private, int private@entry) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/futex-internal.c:128)
libc.so.6!__pthread_rwlock_wrlock_full64(pthread_rwlock_t * rwlock, clockid_t clockid, const struct timespec * abstime) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_rwlock_common.c:829)
libc.so.6!___pthread_rwlock_wrlock(pthread_rwlock_t * rwlock) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_rwlock_wrlock.c:26)
libhalsim_ws_server.so!std::__glibcxx_rwlock_wrlock(pthread_rwlock_t * __rwlock) (/usr/include/c++/11/shared_mutex:80)
libhalsim_ws_server.so!std::__shared_mutex_pthread::lock(class std::__shared_mutex_pthread * const this) (/usr/include/c++/11/shared_mutex:193)
libhalsim_ws_server.so!std::shared_mutex::lock(class std::shared_mutex * const this) (/usr/include/c++/11/shared_mutex:420)
libhalsim_ws_server.so!std::unique_lock<std::shared_mutex>::lock(class std::unique_lock<std::shared_mutex> * const this) (/usr/include/c++/11/bits/unique_lock.h:139)
libhalsim_ws_server.so!std::unique_lock<std::shared_mutex>::unique_lock(std::unique_lock<std::shared_mutex>::mutex_type & __m, class std::unique_lock<std::shared_mutex> * const this) (/usr/include/c++/11/bits/unique_lock.h:69)
libhalsim_ws_server.so!wpilibws::ProviderContainer::Add(class wpilibws::ProviderContainer * const this, std::string_view key, class std::shared_ptr<wpilibws::HALSimWSBaseProvider> provider) (/home/brettle/git/allwpilib/simulation/halsim_ws_core/src/main/native/include/WSProviderContainer.h:31)
libhalsim_ws_server.so!wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(class wpilibws::HALSimWSProviderSimDevices * const this, const char * name, HAL_SimDeviceHandle handle) (/usr/include/c++/11/string_view:137)
libwpiHal.so!hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)>::Invoke<int&>(const char * name, const class hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)> * const this) (/home/brettle/git/allwpilib/hal/src/main/native/sim/mockdata/SimDeviceDataInternal.h:123)
libwpiHal.so!hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)>::operator()<char const*&, int&>(const class hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)> * const this) (/home/brettle/git/allwpilib/hal/src/main/native/sim/mockdata/SimDeviceDataInternal.h:131)
libwpiHal.so!hal::SimDeviceData::CreateDevice(class hal::SimDeviceData * const this, const char * name) (/home/brettle/git/allwpilib/hal/src/main/native/sim/mockdata/SimDeviceData.cpp:114)
libwpiHaljni.so!Java_edu_wpi_first_hal_SimDeviceJNI_createSimDevice(JNIEnv * env, jstring name) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/llvm/include/wpi/SmallVector.h:273)
[Unknown/Just-In-Time compiled code] (Unknown Source:0)

Thread processing new HALSimWS client connection is waiting on SimDeviceData mutex while holding ProviderContainer mutex :

libwpiHal.so!wpi::recursive_spinlock1::try_lock(wpi::recursive_spinlock1 * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/include/wpi/spinlock.h:56)
libwpiHal.so!wpi::recursive_spinlock1::lock(wpi::recursive_spinlock1 * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/include/wpi/spinlock.h:71)
libwpiHal.so!std::scoped_lock<wpi::recursive_spinlock1>::scoped_lock(std::scoped_lock<wpi::recursive_spinlock1>::mutex_type & __m, std::scoped_lock<wpi::recursive_spinlock1> * const this) (/usr/include/c++/11/mutex:655)
libwpiHal.so!hal::SimDeviceData::RegisterValueCreatedCallback(hal::SimDeviceData * const this, HAL_SimDeviceHandle device, void * param, HALSIM_SimValueCallback callback, bool initialNotify) (/home/brettle/git/allwpilib/hal/src/main/native/sim/mockdata/SimDeviceData.cpp:346)
libhalsim_ws_server.so!wpilibws::HALSimWSProviderSimDevice::OnNetworkConnected(wpilibws::HALSimWSProviderSimDevice * const this, std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection> ws) (/home/brettle/git/allwpilib/simulation/halsim_ws_core/src/main/native/cpp/WSProvider_SimDevice.cpp:37)
libhalsim_ws_server.so!operator()(const struct {...} * const __closure) (/home/brettle/git/allwpilib/simulation/halsim_ws_server/src/main/native/cpp/HALSimWeb.cpp:143)
libhalsim_ws_server.so!std::__invoke_impl<void, wpilibws::HALSimWeb::RegisterWebsocket(std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection>)::<lambda(std::shared_ptr<wpilibws::HALSimWSBaseProvider>)>&, std::shared_ptr<wpilibws::HALSimWSBaseProvider> >(struct {...} & __f) (/usr/include/c++/11/bits/invoke.h:61)
libhalsim_ws_server.so!std::__invoke_r<void, wpilibws::HALSimWeb::RegisterWebsocket(std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection>)::<lambda(std::shared_ptr<wpilibws::HALSimWSBaseProvider>)>&, std::shared_ptr<wpilibws::HALSimWSBaseProvider> >(struct {...} & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libhalsim_ws_server.so!std::_Function_handler<void(std::shared_ptr<wpilibws::HALSimWSBaseProvider>), wpilibws::HALSimWeb::RegisterWebsocket(std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection>)::<lambda(std::shared_ptr<wpilibws::HALSimWSBaseProvider>)> >::_M_invoke(const std::_Any_data &, std::shared_ptr<wpilibws::HALSimWSBaseProvider> &&)(const std::_Any_data & __functor, std::shared_ptr<wpilibws::HALSimWSBaseProvider> && __args#0) (/usr/include/c++/11/bits/std_function.h:290)
libhalsim_ws_server.so!std::function<void (std::shared_ptr<wpilibws::HALSimWSBaseProvider>)>::operator()(std::shared_ptr<wpilibws::HALSimWSBaseProvider>) const(std::shared_ptr<wpilibws::HALSimWSBaseProvider> __args#0, const std::function<void(std::shared_ptr<wpilibws::HALSimWSBaseProvider>)> * const this) (/usr/include/c++/11/bits/std_function.h:590)
libhalsim_ws_server.so!wpilibws::ProviderContainer::ForEach(std::function<void (std::shared_ptr<wpilibws::HALSimWSBaseProvider>)>)(wpilibws::ProviderContainer * const this, wpilibws::ProviderContainer::IterFn fn) (/home/brettle/git/allwpilib/simulation/halsim_ws_core/src/main/native/include/WSProviderContainer.h:43)
libhalsim_ws_server.so!wpilibws::HALSimWeb::RegisterWebsocket(wpilibws::HALSimWeb * const this, std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection> hws) (/home/brettle/git/allwpilib/simulation/halsim_ws_server/src/main/native/cpp/HALSimWeb.cpp:142)
libhalsim_ws_server.so!operator()<wpi::sig::Connection, std::basic_string_view<char> >(const struct {...} * const __closure) (/usr/include/c++/11/bits/shared_ptr_base.h:731)
libhalsim_ws_server.so!wpi::sig::detail::Slot<wpilibws::HALSimHttpConnection::ProcessWsUpgrade()::<lambda(auto:33, auto:34)>, wpi::sig::trait::typelist<wpi::sig::Connection&, std::basic_string_view<char, std::char_traits<char> > > >::call_slot(std::basic_string_view<char, std::char_traits<char> >)(wpi::sig::detail::Slot<wpilibws::HALSimHttpConnection::ProcessWsUpgrade()::<lambda(auto:33, auto:34)>, wpi::sig::trait::typelist<wpi::sig::Connection&, std::basic_string_view<char, std::char_traits<char> > > > * const this, std::basic_string_view<char, std::char_traits<char> > args#0) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:349)
libhalsim_ws_server.so!wpi::sig::detail::SlotBase<std::basic_string_view<char, std::char_traits<char> > >::operator()<std::basic_string_view<char, std::char_traits<char> >&>(wpi::sig::detail::SlotBase<std::basic_string_view<char, std::char_traits<char> > > * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:311)
libhalsim_ws_server.so!wpi::sig::detail::SlotBase<std::basic_string_view<char, std::char_traits<char> > >::operator()<std::basic_string_view<char, std::char_traits<char> >&>(wpi::sig::detail::SlotBase<std::basic_string_view<char, std::char_traits<char> > > * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:311)
libhalsim_ws_server.so!wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots::operator()<std::basic_string_view<char, std::char_traits<char> > >(wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:514)
libhalsim_ws_server.so!std::__invoke_impl<void, wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots&, std::basic_string_view<char, std::char_traits<char> > >(wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots & __f) (/usr/include/c++/11/bits/invoke.h:61)
libhalsim_ws_server.so!std::__invoke_r<void, wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots&, std::basic_string_view<char, std::char_traits<char> > >(wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libhalsim_ws_server.so!std::_Function_handler<void (std::basic_string_view<char, std::char_traits<char> >), wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots>::_M_invoke(std::_Any_data const&, std::basic_string_view<char, std::char_traits<char> >&&)(const std::_Any_data & __functor, std::basic_string_view<char, std::char_traits<char> > && __args#0) (/usr/include/c++/11/bits/std_function.h:290)
libwpinet.so!std::function<void (std::basic_string_view<char, std::char_traits<char> >)>::operator()(std::basic_string_view<char, std::char_traits<char> >) const(std::basic_string_view<char, std::char_traits<char> > __args#0, const std::function<void(std::basic_string_view<char, std::char_traits<char> >)> * const this) (/usr/include/c++/11/bits/std_function.h:586)
libwpinet.so!wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::operator()<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(const wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > > * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:573)
libwpinet.so!operator()<std::span<wpi::uv::Buffer> >(std::span<wpi::uv::Buffer, 18446744073709551615> bufs, const struct {...} * const __closure) (/home/brettle/git/allwpilib/wpinet/src/main/native/cpp/WebSocket.cpp:382)
libwpinet.so!std::__invoke_impl<void, wpi::WebSocket::StartServer(std::string_view, std::string_view, std::string_view)::<lambda(auto:24, wpi::uv::Error)>&, std::span<wpi::uv::Buffer, 18446744073709551615>, wpi::uv::Error>(struct {...} & __f) (/usr/include/c++/11/bits/invoke.h:61)
libwpinet.so!std::__invoke_r<void, wpi::WebSocket::StartServer(std::string_view, std::string_view, std::string_view)::<lambda(auto:24, wpi::uv::Error)>&, std::span<wpi::uv::Buffer, 18446744073709551615>, wpi::uv::Error>(struct {...} & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libwpinet.so!std::_Function_handler<void(std::span<wpi::uv::Buffer, 18446744073709551615>, wpi::uv::Error), wpi::WebSocket::StartServer(std::string_view, std::string_view, std::string_view)::<lambda(auto:24, wpi::uv::Error)> >::_M_invoke(const std::_Any_data &, std::span<wpi::uv::Buffer, 18446744073709551615> &&, wpi::uv::Error &&)(const std::_Any_data & __functor, std::span<wpi::uv::Buffer, 18446744073709551615> && __args#0, wpi::uv::Error && __args#1) (/usr/include/c++/11/bits/std_function.h:290)
libwpinet.so!std::function<void (std::span<wpi::uv::Buffer, 18446744073709551615ul>, wpi::uv::Error)>::operator()(std::span<wpi::uv::Buffer, 18446744073709551615ul>, wpi::uv::Error) const(wpi::uv::Error __args#1, std::span<wpi::uv::Buffer, 18446744073709551615> __args#0, const std::function<void(std::span<wpi::uv::Buffer, 18446744073709551615>, wpi::uv::Error)> * const this) (/usr/include/c++/11/bits/std_function.h:590)
libwpinet.so!operator()(wpi::uv::Error err, const struct {...} * const __closure) (/home/brettle/git/allwpilib/wpinet/src/main/native/cpp/uv/Stream.cpp:21)
libwpinet.so!std::__invoke_impl<void, (anonymous namespace)::CallbackWriteReq::CallbackWriteReq(std::span<const wpi::uv::Buffer>, std::function<void(std::span<wpi::uv::Buffer>, wpi::uv::Error)>)::<lambda(wpi::uv::Error)>&, wpi::uv::Error>(struct {...} & __f) (/usr/include/c++/11/bits/invoke.h:61)
libwpinet.so!std::__invoke_r<void, (anonymous namespace)::CallbackWriteReq::CallbackWriteReq(std::span<const wpi::uv::Buffer>, std::function<void(std::span<wpi::uv::Buffer>, wpi::uv::Error)>)::<lambda(wpi::uv::Error)>&, wpi::uv::Error>(struct {...} & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libwpinet.so!std::_Function_handler<void(wpi::uv::Error), (anonymous namespace)::CallbackWriteReq::CallbackWriteReq(std::span<const wpi::uv::Buffer>, std::function<void(std::span<wpi::uv::Buffer>, wpi::uv::Error)>)::<lambda(wpi::uv::Error)> >::_M_invoke(const std::_Any_data &, wpi::uv::Error &&)(const std::_Any_data & __functor, wpi::uv::Error && __args#0) (/usr/include/c++/11/bits/std_function.h:290)
libwpinet.so!std::function<void (wpi::uv::Error)>::operator()(wpi::uv::Error) const(wpi::uv::Error __args#0, const std::function<void(wpi::uv::Error)> * const this) (/usr/include/c++/11/bits/std_function.h:586)
libwpinet.so!wpi::sig::SignalBase<wpi::sig::detail::NullMutex, wpi::uv::Error>::operator()<wpi::uv::Error>(const wpi::sig::SignalBase<wpi::sig::detail::NullMutex, wpi::uv::Error> * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:573)
libwpinet.so!operator()(uv_write_t * r, const struct {...} * const __closure, int status) (/home/brettle/git/allwpilib/wpinet/src/main/native/cpp/uv/Stream.cpp:130)
libwpinet.so!_FUN() (/home/brettle/git/allwpilib/wpinet/src/main/native/cpp/uv/Stream.cpp:131)
libwpinet.so!uv__write_callbacks(uv_stream_t * stream, uv_stream_t * stream@entry) (/home/brettle/git/allwpilib/wpinet/src/main/native/thirdparty/libuv/src/unix/stream.cpp:926)
libwpinet.so!uv__stream_io(uv_loop_t * loop, uv__io_t * w, unsigned int events) (/home/brettle/git/allwpilib/wpinet/src/main/native/thirdparty/libuv/src/unix/stream.cpp:1228)
libwpinet.so!uv__run_pending(uv_loop_t * loop, uv_loop_t * loop@entry) (/home/brettle/git/allwpilib/wpinet/src/main/native/thirdparty/libuv/src/unix/core.cpp:850)
libwpinet.so!uv_run(uv_loop_t * loop, uv_run_mode mode, uv_run_mode mode@entry) (/home/brettle/git/allwpilib/wpinet/src/main/native/thirdparty/libuv/src/unix/core.cpp:453)
libwpinet.so!wpi::uv::Loop::Run(wpi::uv::Loop::Mode mode, wpi::uv::Loop * const this) (/home/brettle/git/allwpilib/wpinet/src/main/native/include/wpinet/uv/Loop.h:113)
libwpinet.so!wpi::EventLoopRunner::Thread::Main(wpi::EventLoopRunner::Thread * const this) (/home/brettle/git/allwpilib/wpinet/src/main/native/cpp/EventLoopRunner.cpp:36)
libwpiutil.so!operator()(const struct {...} * const __closure) (/home/brettle/git/allwpilib/wpiutil/src/main/native/cpp/SafeThread.cpp:79)
libwpiutil.so!std::__invoke_impl<void, wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> >(struct {...} && __f) (/usr/include/c++/11/bits/invoke.h:61)
libwpiutil.so!std::__invoke<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> >(struct {...} && __fn) (/usr/include/c++/11/bits/invoke.h:96)
libwpiutil.so!std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > >::_M_invoke<0>(std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > * const this) (/usr/include/c++/11/bits/std_thread.h:259)
libwpiutil.so!std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > >::operator()(std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > * const this) (/usr/include/c++/11/bits/std_thread.h:266)
libwpiutil.so!std::thread::_State_impl<std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > >::_M_run(void)(std::thread::_State_impl<std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > > * const this) (/usr/include/c++/11/bits/std_thread.h:211)
libstdc++.so.6!execute_native_thread_routine (Unknown Source:0)
libc.so.6!start_thread(void * arg) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_create.c:447)
libc.so.6!clone3() (/usr/src/debug/glibc-2.39-17.fc40.x86_64/sysdeps/unix/sysv/linux/x86_64/clone3.S:78)
PeterJohnson commented 1 month ago

My sense is that the best fix will be to fix SimDeviceData to not hold its mutex during callbacks.

brettle commented 1 month ago

Further investigation indicates that the first deadlock above appears to be occurring sometime after the HALSimWS connection has been established, so I've updated the title and comment to reflect that. This also makes the issue harder to workaround because it means that one can't just delay creating SimDevices until after any HALSimWS connection has been made.

Side note: Is there any chance of this getting fixed in a future 2024.x release or will there not be any more 2024.x releases because the season is over? (Also, presumably this should be marked as a bug.)

PeterJohnson commented 1 month ago

No, we will not be making any more 2024.x releases. Our next release will be for 2025 beta.