mxmlnkn / rapidgzip

Gzip Decompression and Random Access for Modern Multi-Core Machines
Apache License 2.0
345 stars 7 forks source link

Crash: Fatal Python error: PyGILState_Release - when reading from Python file-like object #26

Closed pdjstone closed 6 months ago

pdjstone commented 6 months ago

I'm using rapidgzip with a Python file-like object (rather than a native file). When I raise a KeyboardInterrupt (Ctrl-C) during decompression, I sometimes get the following crash:

KeyboardInterrupt
Fatal Python error: PyGILState_Release: thread state 0x78e7a4020d20 must be current when releasing
Python runtime state: finalizing (tstate=0x000063b818ec9f70)

Thread 0x000078e7d662a000 (most recent call first):
  <no Python frame>

Extension modules: rapidgzip (total: 1)
Aborted (core dumped)

In my case, the file-like object I'm using is sometimes waiting on fairly slow IO which might make this more likely to occur.

The following code reproduces the issue for me:

import sys
import rapidgzip

class myfile:
    def __init__(self, fd):
        self.fd = fd
        self.r = open('/dev/urandom', 'rb')

    def seek(self, offset, whence=0) -> int:
        return self.fd.seek(offset, whence)

    def read(self, amt=-1) -> bytes:
        self.r.read(50000) # not strictly required to reproduce, but seems to make bug more likely
        print('.', end='', flush=True)
        return self.fd.read(amt)

    def tell(self) -> int:
        return self.fd.tell()

    def seekable(self):
        return self.fd.seekable()

    def readable(self):
        return self.fd.readable()

    def writable(self):
        return self.fd.writable()

    def write(self, data):
        raise NotImplemented()

    def close(self):
        self.fd.close()

if __name__ == '__main__':
    fd = open(sys.argv[1], 'rb')
    fd = myfile(fd)
    gz = rapidgzip.open(fd)
    while gz.read(1024*1024):
        pass

If you run the above code with a large .gz file, and press Ctrl-C to kill the process you sometimes get the crash. You may have to run it a few times to observe the behaviour.

This may be related to issue #24

mxmlnkn commented 6 months ago

Thank you for reporting. Too bad you encountered the bug, it worked just fine before I tried to fix the debug mode error. The good news is, I can reproduce it with every try with your submitted script. Starting it with gdb --ex r python3 reproducer.py, then pressing Ctrl+C, then submitting signal 2, yields:

(gdb)  info threads
  Id   Target Id                                               Frame 
  1    Thread 0x7ffff7c3f1c0 (LWP 1363882) "python3"           0x00007ffff7c8275b in kill () at ../sysdeps/unix/syscall-template.S:120
  2    Thread 0x7ffff65fe640 (LWP 1363885) "python3"           futex_wait (private=0, expected=2, futex_word=0x555555bc9bd0) at ../sysdeps/nptl/futex-internal.h:146
  3    Thread 0x7fffeddfd640 (LWP 1363886) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd0) at ./nptl/futex-internal.c:57
  4    Thread 0x7ffff5dfd640 (LWP 1363887) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd0) at ./nptl/futex-internal.c:57
  5    Thread 0x7ffff55fc640 (LWP 1363888) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd0) at ./nptl/futex-internal.c:57
  6    Thread 0x7ffff4dfb640 (LWP 1363889) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd0) at ./nptl/futex-internal.c:57
  7    Thread 0x7fffeffff640 (LWP 1363890) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  8    Thread 0x7fffef7fe640 (LWP 1363891) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  9    Thread 0x7fffeeffd640 (LWP 1363892) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  10   Thread 0x7fffee7fc640 (LWP 1363893) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  11   Thread 0x7fffed5fc640 (LWP 1363894) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  12   Thread 0x7fffecdfb640 (LWP 1363895) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  13   Thread 0x7fffbffff640 (LWP 1363896) "python3"           futex_wait (private=0, expected=2, futex_word=0x555555bc9bd0) at ../sysdeps/nptl/futex-internal.h:146
  14   Thread 0x7fffc7fff640 (LWP 1363897) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  15   Thread 0x7fffc77fe640 (LWP 1363898) "python3"           futex_wait (private=0, expected=2, futex_word=0x555555bc9bd0) at ../sysdeps/nptl/futex-internal.h:146
  16   Thread 0x7fffc6ffd640 (LWP 1363899) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  17   Thread 0x7fffc67fc640 (LWP 1363900) "python3"           futex_wait (private=0, expected=2, futex_word=0x555555bc9bd0) at ../sysdeps/nptl/futex-internal.h:146
  18   Thread 0x7fffc5ffb640 (LWP 1363901) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  19   Thread 0x7fffc57fa640 (LWP 1363902) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
* 20   Thread 0x7fffc4ff9640 (LWP 1363903) "python3" (Exiting) __pthread_kill_implementation (no_tid=0, signo=6, threadid=140736498472512) at ./nptl/pthread_kill.c:44
  21   Thread 0x7fffbf7fe640 (LWP 1363904) "python3"           futex_wait (private=0, expected=2, futex_word=0x555555bc9bd0) at ../sysdeps/nptl/futex-internal.h:146
  22   Thread 0x7fffbeffd640 (LWP 1363905) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  23   Thread 0x7fffbe7fc640 (LWP 1363906) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  24   Thread 0x7fffbdffb640 (LWP 1363907) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
  25   Thread 0x7fffbd7fa640 (LWP 1363908) "python3"           __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57

(gdb) thread apply all bt
Thread 20 (Thread 0x7fffc4ff9640 (LWP 1363903)):
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140736498472512) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140736498472512) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140736498472512, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c82476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c687f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00005555555c5a4f in ?? ()
#6  0x000055555566b7b3 in _Py_FatalErrorFormat ()
#7  0x00005555555e8819 in ?? ()

#8  0x00007ffff6c0477b in ScopedGIL::GILMutex::lock (doLock=false, this=<optimized out>) at core/filereader/Python.hpp:64
#9  ScopedGIL::~ScopedGIL (this=0x7fffc4f8d3cf, __in_chrg=<optimized out>) at core/filereader/Python.hpp:117
#10 ScopedGIL::~ScopedGIL (this=this@entry=0x7fffc4f8d3cf, __in_chrg=<optimized out>) at core/filereader/Python.hpp:103
#11 0x00007ffff6c058cf in ScopedGILLock::~ScopedGILLock (this=0x7fffc4f8d3cf, __in_chrg=<optimized out>) at core/filereader/Python.hpp:133

#12 PythonFileReader::read (this=<optimized out>, buffer=<optimized out>, nMaxBytesToRead=<optimized out>) at core/filereader/Python.hpp:404
#13 0x00007ffff6b85ac2 in SharedFileReader::read (this=0x7fff98041350, 
    buffer=0x7fffc4fa3050 "Qǃ^\026\214\262\n\240/\253\240\362s\351ve\254\240\211\362\261\352\333\f\342H\242\313\373\r\333`d\366\342\301Y\f)\370\061\242\205\061\250\v\341gK\022\035\fP\356\067[\307\373\367\255ӗ\372c\315}\006\032\212\376\211\317\327\002\273\016\363\365EDl\220gߖ\267\266?\204\025\225E\365\200\271N\250\253\062\267\337f\267hy\212\243\b\310\355A\306%\306\004\371\064\021\377U\235\365\250\343\202j\362D\361\322L\271\341\224\333\351\071\217.\345v+\346\320\036\322\rS\364\325!,\215\213\200\227I\265\265ө\221\367O\002\262X\352؀\227\343\b\342<C\365\242\227\226\200\001tK\367\257\304-\272|\312\004\375\221\024O\344\367\004"..., nMaxBytesToRead=131072) at core/filereader/Shared.hpp:315
#14 0x00007ffff6bbc22e in BitReader<false, unsigned long>::read (this=0x7fffc4f8da90, outputBuffer=<optimized out>, nBytesToRead=131072)
    at core/BitReader.hpp:326
#15 0x00007ffff6bbce53 in rapidgzip::IsalInflateWrapper::refillBuffer (this=0x7fffc4f8da90) at rapidgzip/isal.hpp:198
#16 rapidgzip::IsalInflateWrapper::readStream (this=this@entry=0x7fffc4f8da90, output=output@entry=0x7fff7eb70080 "", outputSize=outputSize@entry=131072)
    at rapidgzip/isal.hpp:228
#17 0x00007ffff6c39191 in rapidgzip::GzipChunkFetcher<FetchingStrategy::FetchMultiStream, rapidgzip::ChunkData, false>::finishDecodeBlockWithIsal (
    bitReader=bitReader@entry=0x7fffc4ff85e0, untilOffset=untilOffset@entry=11844714496, initialWindow=..., 
    maxDecompressedChunkSize=maxDecompressedChunkSize@entry=83886080, result=...) at rapidgzip/GzipChunkFetcher.hpp:1001
#18 0x00007ffff6c3b9f0 in rapidgzip::GzipChunkFetcher<FetchingStrategy::FetchMultiStream, rapidgzip::ChunkData, false>::decodeBlockWithRapidgzip (
    bitReader=bitReader@entry=0x7fffc4ff85e0, untilOffset=untilOffset@entry=11844714496, 
    initialWindow=<error reading variable: Cannot access memory at address 0x18cf8>, crc32Enabled=<optimized out>, maxDecompressedChunkSize=83886080)
    at core/VectorView.hpp:53
#19 0x00007ffff6c3c949 in rapidgzip::GzipChunkFetcher<FetchingStrategy::FetchMultiStream, rapidgzip::ChunkData, false>::decodeBlock(BitReader<false, unsigned long> const&, unsigned long, unsigned long, std::optional<VectorView<unsigned char> >, std::optional<unsigned long>, std::atomic<bool> const&, bool, unsigned long, bool)::{lambda(std::pair<unsigned long, unsigned long> const&)#1}::operator()(std::pair<unsigned long, unsigned long> const&) const (
    offset=<synthetic pointer>{...}, __closure=<synthetic pointer>) at rapidgzip/GzipChunkFetcher.hpp:690
#20 rapidgzip::GzipChunkFetcher<FetchingStrategy::FetchMultiStream, rapidgzip::ChunkData, false>::decodeBlock (originalBitReader=..., 
    blockOffset=blockOffset@entry=11811160064, untilOffset=untilOffset@entry=11844714496, initialWindow=..., decodedSize=..., cancelThreads=..., 
    crc32Enabled=<optimized out>, maxDecompressedChunkSize=<optimized out>, untilOffsetIsExact=<optimized out>) at rapidgzip/GzipChunkFetcher.hpp:807
#21 0x00007ffff6c4584c in rapidgzip::GzipChunkFetcher<FetchingStrategy::FetchMultiStream, rapidgzip::ChunkData, false>::decodeBlock (
    nextBlockOffset=<optimized out>, blockOffset=11811160064, this=0x555555c215a0) at rapidgzip/GzipChunkFetcher.hpp:619
#22 BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::decodeAndMeasureBlock (
    nextBlockOffset=<optimized out>, blockOffset=11811160064, this=0x555555c215a0) at core/BlockFetcher.hpp:618
#23 BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}::operator()() const (__closure=<optimized out>) at core/BlockFetcher.hpp:537

#24 std::__invoke_impl<rapidgzip::ChunkData, BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetc--Type <RET> for more, q to quit, c to continue without paging--
hNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}&>(std::__invoke_other, BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#25 std::__invoke_r<rapidgzip::ChunkData, BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}&>(BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:116
#26 std::__future_base::_Task_state<BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}, std::allocator<int>, rapidgzip::ChunkData ()>::_M_run()::{lambda()#1}::operator()() const (__closure=<optimized out>) at /usr/include/c++/11/future:1469
#27 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<rapidgzip::ChunkData>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}, std::allocator<int>, rapidgzip::ChunkData ()>::_M_run()::{lambda()#1}, rapidgzip::ChunkData>::operator()() const (this=0x7fffc4ff8d90) at /usr/include/c++/11/future:1386
#28 0x00007ffff6c45b95 in std::__invoke_impl<std::unique_ptr<std::__future_base::_Result<rapidgzip::ChunkData>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<rapidgzip::ChunkData>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}, std::allocator<int>, rapidgzip::ChunkData ()>::_M_run()::{lambda()#1}, rapidgzip::ChunkData>&>(std::__invoke_other, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<rapidgzip::ChunkData>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}, std::allocator<int>, rapidgzip::ChunkData ()>::_M_run()::{lambda()#1}, rapidgzip::ChunkData>&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#29 std::__invoke_r<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<rapidgzip::ChunkData>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}, std::allocator<int>, rapidgzip::ChunkData ()>::_M_run()::{lambda()#1}, rapidgzip::ChunkData>&>(std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<rapidgzip::ChunkData>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}, std::allocator<int>, rapidgzip::ChunkData ()>::_M_run()::{lambda()#1}, rapidgzip::ChunkData>&) (
    __fn=...) at /usr/include/c++/11/bits/invoke.h:116
#30 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<rapidgzip::ChunkData>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}, std::allocator<int>, rapidgzip::ChunkData ()>::_M_run()::{lambda()#1}, rapidgzip::ChunkData> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/11/bits/std_function.h:291
#31 0x00007ffff6b658dd in std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const
    (this=<optimized out>) at /usr/include/c++/11/bits/std_function.h:590
#32 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (this=0x555555b84fd0, __f=<optimized out>, __did_set=0x7fffc4ff8d3f) at /usr/include/c++/11/future:571
#33 0x00007ffff7cd9f68 in __pthread_once_slow (once_control=0x555555b84fe8, init_routine=0x7ffff68ead50 <__once_proxy>) at ./nptl/pthread_once.c:116
#34 0x00007ffff6b7349e in __gthread_once (__func=<optimized out>, __once=0x555555b84fe8) at /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:700
#35 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) (
    __f=@0x7fffc4ff8d60: (void (std::__future_base::_State_baseV2::*)(std::__future_base::_State_baseV2 * const, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()> *, bool *)) 0x7ffff6b658b0 <std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)>, __once=...) at /usr/include/c++/11/mutex:783
#36 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) (__ignore_failure=false, __res=..., this=0x555555b84fd0) at /usr/include/c++/11/future:411
#37 std::__future_base::_Task_state<BlockFetcher<rapidgzip::GzipBlockFinder, rapidgzip::ChunkData, FetchingStrategy::FetchMultiStream, false>::prefetchNewBlocks(std::function<unsigned long (unsigned long)> const&, std::function<bool ()> const&)::{lambda()#2}, std::allocator<int>, rapidgzip::ChunkData ()>::_M_run() (
    this=0x555555b84fd0) at /usr/include/c++/11/future:1471

#38 0x00007ffff6b843c5 in ThreadPool::PackagedTaskWrapper::operator() (this=<synthetic pointer>) at /usr/include/c++/11/bits/unique_ptr.h:407
#39 ThreadPool::workerMain (threadIndex=<optimized out>, this=0x555555c21b00) at core/ThreadPool.hpp:215
#40 ThreadPool::spawnThread()::{lambda()#1}::operator()() const (__closure=<optimized out>) at core/ThreadPool.hpp:223
#41 JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}::operator()() const (
    __closure=<optimized out>) at core/JoiningThread.hpp:41
#42 std::__invoke_impl<void, JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}>(std::__invoke_other, JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}&&) (__f=...)
    at /usr/include/c++/11/bits/invoke.h:61
#43 std::__invoke<JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#44 std::thread::_Invoker<std::tuple<JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:259
#45 std::thread::_Invoker<std::tuple<JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}> >::operator()() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:266
#46 std::thread::_State_impl<std::thread::_Invoker<std::tuple<JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}> > >::_M_run() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211
#47 0x00007ffff68ec253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#48 0x00007ffff7cd4b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#49 0x00007ffff7d66a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I'm not sure how to fix it though. I guess if the error is detected on the Python-side, then I can also check the release state somehow and simply not release the lock in that case a second time. This is rather ad-hoc but probably should work just fine. It would still be nice to know who releases the lock in the first place. Probably some signal handler magic code.

This is a nightmare. After adding PyGILState_Check, I now get another shutdown problem 1 out of 5 times maybe:

(gdb) signal 2
Continuing with signal SIGINT.
.............................................................Traceback (most recent call last):
  File "indexed_bzip2/build/trigger-pygilstate-release-error.py", line 39, in <module>
.    while gz.read(1024*1024):
.KeyboardInterrupt
[Thread 0x7fffadffb640 (LWP 1402052) exited]
[Thread 0x7fffaeffd640 (LWP 1402050) exited]
[Thread 0x7fffbdffb640 (LWP 1402045) exited]
[Thread 0x7fffae7fc640 (LWP 1402051) exited]
[Thread 0x7fffbf7fe640 (LWP 1402042) exited]
[Thread 0x7fffeffff640 (LWP 1402035) exited]
[Thread 0x7ffff5e23640 (LWP 1402032) exited]
[Thread 0x7ffff6624640 (LWP 1402030) exited]
[Thread 0x7fffbffff640 (LWP 1402041) exited]
[Thread 0x7fffed622640 (LWP 1402039) exited]
[Thread 0x7ffff4e21640 (LWP 1402034) exited]
[Thread 0x7fffaffff640 (LWP 1402048) exited]
[Thread 0x7fffee7fc640 (LWP 1402038) exited]
[Thread 0x7ffff5622640 (LWP 1402033) exited]
[Thread 0x7fffad7fa640 (LWP 1402053) exited]
[Thread 0x7fffbd7fa640 (LWP 1402046) exited]
[Thread 0x7fffaf7fe640 (LWP 1402049) exited]
[Thread 0x7fffbcff9640 (LWP 1402047) exited]
[Thread 0x7fffbe7fc640 (LWP 1402044) exited]
[Thread 0x7fffece21640 (LWP 1402040) exited]
[Thread 0x7fffede23640 (LWP 1402031) exited]
[Thread 0x7fffbeffd640 (LWP 1402043) exited]
[Thread 0x7fffef7fe640 (LWP 1402036) exited]
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stdout>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x0000555555b5bfb0)

Thread 0x00007fffbeffd640 (most recent call first):
  <no Python frame>

Thread 0x00007fffef7fe640 (most recent call first):
  <no Python frame>

Thread 0x00007fffede23640 (most recent call first):
  <no Python frame>

Thread 0x00007fffbcff9640 (most recent call first):
  <no Python frame>

Thread 0x00007fffbe7fc640 (most recent call first):
  <no Python frame>

Thread 0x00007fffece21640 (most recent call first):
  <no Python frame>

Thread 0x00007fffbd7fa640 (most recent call first):
  <no Python frame>

Thread 0x00007fffaf7fe640 (most recent call first):
  <no Python frame>

Thread 0x00007fffad7fa640 (most recent call first):
  <no Python frame>

Thread 0x00007fffee7fc640 (most recent call first):
  <no Python frame>

Thread 0x00007fffaffff640 (most recent call first):
  <no Python frame>

Thread 0x00007ffff4e21640 (most recent call first):
  <no Python frame>

Thread 0x00007ffff5622640 (most recent call first):
  <no Python frame>

Thread 0x00007fffed622640 (most recent call first):
  <no Python frame>

Thread 0x00007fffbffff640 (most recent call first):
  <no Python frame>

Thread 0x00007ffff6624640 (most recent call first):
  <no Python frame>

Thread 0x00007fffeffff640 (most recent call first):
  <no Python frame>

Thread 0x00007fffae7fc640 (most recent call first):
  <no Python frame>

Thread 0x00007ffff5e23640 (most recent call first):
  <no Python frame>

Thread 0x00007fffbf7fe640 (most recent call first):
  <no Python frame>

Thread 0x00007fffaeffd640 (most recent call first):
  <no Python frame>

Thread 0x00007fffbdffb640 (most recent call first):
  <no Python frame>

Current thread 0x00007ffff7c3f1c0 (most recent call first):
  <no Python frame>

Extension modules: rapidgzip (total: 1)

Thread 1 "python3" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350201792) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.

(gdb) info thread
  Id   Target Id                                     Frame 
* 1    Thread 0x7ffff7c3f1c0 (LWP 1402020) "python3" __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350201792) at ./nptl/pthread_kill.c:44
  9    Thread 0x7fffeeffd640 (LWP 1402037) "python3" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, 
    futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57

(gdb) thread apply all bt

Thread 9 (Thread 0x7fffeeffd640 (LWP 1402037) "python3"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x555555c21bd4) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x555555c21bd4, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007ffff7cd3ac1 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555555c21b80, cond=0x555555c21ba8) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x555555c21ba8, mutex=0x555555c21b80) at ./nptl/pthread_cond_wait.c:627
#5  0x00007ffff6ba0367 in std::condition_variable::wait<ThreadPool::workerMain(unsigned long)::{lambda()#1}>(std::unique_lock<std::mutex>&, ThreadPool::workerMain(unsigned long)::{lambda()#1}) (__p=..., __lock=..., this=0x555555c21ba8) at /usr/include/c++/11/condition_variable:103
#6  ThreadPool::workerMain (threadIndex=<optimized out>, this=0x555555c21b00) at core/ThreadPool.hpp:203
#7  ThreadPool::spawnThread()::{lambda()#1}::operator()() const (__closure=<optimized out>) at core/ThreadPool.hpp:224
#8  JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}::operator()() const (__closure=<optimized out>) at core/JoiningThread.hpp:41
#9  std::__invoke_impl<void, JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}>(std::__invoke_other, JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}&&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#10 std::__invoke<JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#11 std::thread::_Invoker<std::tuple<JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:259
#12 std::thread::_Invoker<std::tuple<JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}> >::operator()() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:266
#13 std::thread::_State_impl<std::thread::_Invoker<std::tuple<JoiningThread::JoiningThread<ThreadPool::spawnThread()::{lambda()#1}>(ThreadPool::spawnThread()::{lambda()#1}&&)::{lambda()#1}> > >::_M_run() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211
#14 0x00007ffff6912253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#15 0x00007ffff7cd4b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#16 0x00007ffff7d66a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 (Thread 0x7ffff7c3f1c0 (LWP 1402020) "python3"):
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350201792) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737350201792) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737350201792, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c82476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c687f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000055555566b6f3 in _Py_FatalErrorFormat ()
#6  0x000055555564a9f4 in ?? ()
#7  0x0000555555717741 in ?? ()
#8  0x00005555556b83f3 in ?? ()
#9  0x000055555573b68f in ?? ()
#10 0x00005555556b83f3 in ?? ()
#11 0x00005555556977f8 in ?? ()
#12 0x00005555557a6fe7 in ?? ()
#13 0x00005555555c5d8b in ?? ()
#14 0x000055555566b7b3 in _Py_FatalErrorFormat ()
#15 0x000055555564a9f4 in ?? ()
#16 0x0000555555717741 in ?? ()
#17 0x00005555556b83f3 in ?? ()
#18 0x000055555573b68f in ?? ()
#19 0x00005555556b83f3 in ?? ()
#20 0x00005555556977f8 in ?? ()
#21 0x00005555557a6fe7 in ?? ()
#22 0x00005555557b8ada in Py_FinalizeEx ()
#23 0x00005555557aa2d3 in Py_RunMain ()
#24 0x0000555555780cad in Py_BytesMain ()
#25 0x00007ffff7c69d90 in __libc_start_call_main (main=main@entry=0x555555780c70, argc=argc@entry=3, argv=argv@entry=0x7fffffffd788) at ../sysdeps/nptl/libc_start_call_main.h:58
#26 0x00007ffff7c69e40 in __libc_start_main_impl (main=0x555555780c70, argc=3, argv=0x7fffffffd788, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd778) at ../csu/libc-start.c:392
#27 0x0000555555780ba5 in _start ()
mxmlnkn commented 6 months ago

After some longer debugging sessions, I think, I've got a feeling as to what is going on

if __name__ == '__main__':
    fd = open(sys.argv[1], 'rb')
    fd = myfile(fd)
    with rapidgzip.open(fd) as gz:
        while gz.read(1024*1024):
            os.kill(os.getpid(), signal.SIGINT)

This code without the with-statement works very nice as a reproducer without having to manually press Ctrl+C. For debugging, I call it like this: gdb -ex 'b Python.hpp:90' -ex 'handle SIGINT pass' -ex r -ex bt --args python3 trigger-pygilstate-release-error.py 4GiB-base64.gz

mxmlnkn commented 6 months ago

I have "fixed" this in https://github.com/mxmlnkn/indexed_bzip2/commit/3cf304524a51d055b29ddb74512de8fb840eca8b but it effectively only prints a more meaningful error message. To reiterate, the real fix is to take care that all RapidgzipFile file objects are properly closed even in the case of SIGINT, e.g., by using with.