wlav / cppyy

Other
400 stars 41 forks source link

Crash at exit from background thread #58

Closed kunitoki closed 2 years ago

kunitoki commented 2 years ago

I'm getting a crash at exit in macOS from this example https://github.com/kunitoki/popsicle/blob/master/examples/audio_player_python.py you just need to execute the app and close the window.

 *** Break *** illegal instruction
[/usr/local/lib/python3.9/site-packages/cppyy_backend/lib/libcppyy_backend.so] (anonymous namespace)::TExceptionHandlerImp::HandleException(int) (no debug info)
[/usr/local/lib/python3.9/site-packages/cppyy_backend/lib/libCoreLegacy.so] CppyyLegacy::TUnixSystem::DispatchSignals(CppyyLegacy::ESignals) (no debug info)
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[/usr/lib/system/libsystem_c.dylib] __global_locale (no debug info)
[/usr/lib/system/libdispatch.dylib] os_workgroup_attr_set_interval_type (no debug info)
[/usr/lib/system/libsystem_pthread.dylib] _pthread_tsd_cleanup (no debug info)
[/usr/lib/system/libsystem_pthread.dylib] _pthread_exit (no debug info)
[/usr/lib/system/libsystem_pthread.dylib] pthread_exit (no debug info)
[/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/Python] PyThread_exit_thread (no debug info)
[/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/Python] take_gil (no debug info)
[/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/Python] PyGILState_Ensure (no debug info)
[<unknown binary>] (no debug info)
[/Users/kunitoki/popsicle/build/temp.macosx-11-x86_64-3.9/popsicle_artefacts/Release/libpopsicle.dylib] juce::AudioSourcePlayer::audioDeviceIOCallback(float const**, int, float**, int, int) (no debug info)
[/Users/kunitoki/popsicle/build/temp.macosx-11-x86_64-3.9/popsicle_artefacts/Release/libpopsicle.dylib] juce::AudioDeviceManager::audioDeviceIOCallbackInt(float const**, int, float**, int, int) (no debug info)
[/Users/kunitoki/popsicle/build/temp.macosx-11-x86_64-3.9/popsicle_artefacts/Release/libpopsicle.dylib] juce::CoreAudioClasses::CoreAudioInternal::audioCallback(AudioBufferList const*, AudioBufferList*) (no debug info)
[/Users/kunitoki/popsicle/build/temp.macosx-11-x86_64-3.9/popsicle_artefacts/Release/libpopsicle.dylib] juce::CoreAudioClasses::CoreAudioInternal::audioIOProc(unsigned int, AudioTimeStamp const*, AudioBufferList const*, AudioTimeStamp const*, AudioBufferList*, AudioTimeStamp const*, void*) (no debug info)
[/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio] invocation function for block in HALC_ProxyIOContext::HALC_ProxyIOContext(unsigned int, unsigned int) (no debug info)
[/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio] HALB_IOThread::Entry(void*) (no debug info)
[/usr/lib/system/libsystem_pthread.dylib] _pthread_start (no debug info)
[/usr/lib/system/libsystem_pthread.dylib] thread_start (no debug info)
 *** Break *** illegal instruction
[/usr/local/lib/python3.9/site-packages/cppyy_backend/lib/libcppyy_backend.so] (anonymous namespace)::TExceptionHandlerImp::HandleException(int) (no debug info)
[/usr/local/lib/python3.9/site-packages/cppyy_backend/lib/libCoreLegacy.so] CppyyLegacy::TUnixSystem::DispatchSignals(CppyyLegacy::ESignals) (no debug info)
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[/usr/lib/system/libsystem_c.dylib] __global_locale (no debug info)
[/usr/lib/system/libdispatch.dylib] os_workgroup_attr_set_interval_type (no debug info)
[/usr/lib/system/libsystem_pthread.dylib] _pthread_tsd_cleanup (no debug info)
[/usr/lib/system/libsystem_pthread.dylib] _pthread_exit (no debug info)
[/usr/lib/system/libsystem_pthread.dylib] pthread_exit (no debug info)
[/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/Python] PyThread_exit_thread (no debug info)
[/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/Python] take_gil (no debug info)
[/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/Python] PyGILState_Ensure (no debug info)
[<unknown binary>] (no debug info)
[/Users/kunitoki/popsicle/build/temp.macosx-11-x86_64-3.9/popsicle_artefacts/Release/libpopsicle.dylib] juce::AudioSourcePlayer::audioDeviceIOCallback(float const**, int, float**, int, int) (no debug info)
[/Users/kunitoki/popsicle/build/temp.macosx-11-x86_64-3.9/popsicle_artefacts/Release/libpopsicle.dylib] juce::AudioDeviceManager::audioDeviceIOCallbackInt(float const**, int, float**, int, int) (no debug info)
[/Users/kunitoki/popsicle/build/temp.macosx-11-x86_64-3.9/popsicle_artefacts/Release/libpopsicle.dylib] juce::CoreAudioClasses::CoreAudioInternal::audioCallback(AudioBufferList const*, AudioBufferList*) (no debug info)
[/Users/kunitoki/popsicle/build/temp.macosx-11-x86_64-3.9/popsicle_artefacts/Release/libpopsicle.dylib] juce::CoreAudioClasses::CoreAudioInternal::audioIOProc(unsigned int, AudioTimeStamp const*, AudioBufferList const*, AudioTimeStamp const*, AudioBufferList*, AudioTimeStamp const*, void*) (no debug info)
[/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio] invocation function for block in HALC_ProxyIOContext::HALC_ProxyIOContext(unsigned int, unsigned int) (no debug info)
[/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio] HALB_IOThread::Entry(void*) (no debug info)
[/usr/lib/system/libsystem_pthread.dylib] _pthread_start (no debug info)
[/usr/lib/system/libsystem_pthread.dylib] thread_start (no debug info)

It happens from the background audio thread where juce::AudioSourcePlayer::audioDeviceIOCallback is invoked. I've tried to place __release_gil__ from any call in the stack without success. Any idea how to handle the failure ?

platform osx popsicle version 0.0.9 cppyy version 2.3.1

wlav commented 2 years ago

I suspect that the message about the GIL is a bit of a red herring: __release_gil__ only affects the actual C++ call, but once it goes back into Python (as happens in an overridden method), it has to grab the GIL again, so PyGILState_Ensure will always be called. If __release_gil__ was not set, it'd be a noop, otherwise it will acquire the GIL.

"illegal instruction" most probably means that code that that function is trying to call has already been offloaded (this being part of the shutdown). CPython will offload Python modules before it does C++ shared libraries. My thinking is thus that Python is gone, but C++ still calls the callback as part of its clean-up.

In other words, are you certain that that __del__ method, which is erasing the callback, is being called?

The point is that per the Python docs, there is no guarantee that __del__ is called on interpreter exit. This is (IIRC) b/c python has 2 shutdown stages. The first will offload modules using the normal reference counting mechanism (i.e. the interpreter will delete its reference, and modules go away as refcounts go to zero). The second is a forced offload, which doesn't do regular cleanup, b/c at that point it's clear that refcounts are off. You can see where your module falls by providing the -v argument to the Python interpreter: it will list all offloads labeled either [1] or [2].

One way out of this, is to do cleanup in an atexit handler instead of relying on __del__.

kunitoki commented 2 years ago

Yes it seems that the crash happens at exit, where calls of my component del are happening and so cleanup happens too late, and most of the framework is already shutdown at that time. I need to find a way to hook a cleanup function to happen deterministically so i can invoke it at controlled places.

wlav commented 2 years ago

AFAIK, the only good place is an atexit handler. That is guaranteed to be executed when the interpreter is still alive.

Anything C/C++ (incl. global/static objects that Cling maintains) will go down later.

Isn't it possible to simply register MainContentComponent.__del__ with atexit (and flag it internally to set that it has been called)? Or are there more components that needs taking care of?

kunitoki commented 2 years ago

Yeah it was my chain of del that was interrupted for some reason so the cleanup of the MainContentComponent was never called and the framework was shut down. I've experimented a bit with atexit but it was not truly working for me, in the end i was able to workaround it by explicitly calling the code in __del__ in a normal function and leave __del__ empty. For now it's ok and i can progress !

Thanks for the ideas !