mxmlnkn / rapidgzip

Gzip Decompression and Random Access for Modern Multi-Core Machines
Apache License 2.0
345 stars 7 forks source link

Segfault when reading from Python thread #29

Closed pdjstone closed 6 months ago

pdjstone commented 6 months ago

The rapidgzip library crashes with a segfault if the first read/seek happens from a Python thread that isn't the same thread that called rapidgzip.open. If a read or seek is first done from the thread that opened it, no crash happens when subsequently calling read/seek from another thread.

from gzip import GzipFile
import os
from random import randbytes
from tempfile import NamedTemporaryFile
from threading import Thread

import rapidgzip

def create_temp_gzip_file(size_mb=2):
    with NamedTemporaryFile('wb', suffix='.gz', delete=False) as tf:
        with GzipFile(mode='wb', fileobj=tf, compresslevel=2) as gz:
            for _ in range(size_mb):
                gz.write(randbytes(1024*1024))
    return tf.name

def readfile(fd):
    data = fd.read(8)
    print(data)

if __name__ == '__main__':
    filename = create_temp_gzip_file(2)
    with open(filename, 'rb') as fd:
        with rapidgzip.open(fd) as gzip_fd:
            #readfile(gzip_fd) # no segfault if we read from main thread first
            t = Thread(target=readfile, args=[gzip_fd])
            t.start()
            t.join()
    print('done')
    os.unlink(filename)
mxmlnkn commented 6 months ago

Backtrace:

Thread 2 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff63ff6c0 (LWP 1379959)]
_rpmalloc_heap_extract_new_span (heap=0x0, span_count=1, class_idx=<optimized out>, heap_size_class=<optimized out>) at external/rpmalloc/rpmalloc/rpmalloc.c:1834
1834    external/rpmalloc/rpmalloc/rpmalloc.c: No such file or directory.
#0  _rpmalloc_heap_extract_new_span (heap=0x0, span_count=1, class_idx=<optimized out>, heap_size_class=<optimized out>) at external/rpmalloc/rpmalloc/rpmalloc.c:1834
#1  0x00007ffff69fbd48 in _rpmalloc_heap_extract_new_span (class_idx=126, span_count=1, heap_size_class=0x0, heap=0x0) at external/rpmalloc/rpmalloc/rpmalloc.c:1820
#2  _rpmalloc_allocate_large (size=32896, heap=0x0) at external/rpmalloc/rpmalloc/rpmalloc.c:2213
#3  _rpmalloc_allocate (size=32768, heap=0x0) at external/rpmalloc/rpmalloc/rpmalloc.c:2268
#4  _rpmalloc_allocate (size=32768, heap=0x0) at external/rpmalloc/rpmalloc/rpmalloc.c:2261
#5  rpmalloc (size=32768) at external/rpmalloc/rpmalloc/rpmalloc.c:3052
#6  0x00007ffff6927f7b in RpmallocAllocator<unsigned char>::allocate (nElementsToAllocate=32768, this=0x7ffff63fe000) at core/FasterVector.hpp:50
#7  std::allocator_traits<RpmallocAllocator<unsigned char> >::allocate (__n=32768, __a=...) at /usr/include/c++/13/bits/alloc_traits.h:333
#8  std::_Vector_base<unsigned char, RpmallocAllocator<unsigned char> >::_M_allocate (__n=32768, this=0x7ffff63fe000) at /usr/include/c++/13/bits/stl_vector.h:378
#9  std::_Vector_base<unsigned char, RpmallocAllocator<unsigned char> >::_M_create_storage (__n=32768, this=0x7ffff63fe000) at /usr/include/c++/13/bits/stl_vector.h:395
#10 std::_Vector_base<unsigned char, RpmallocAllocator<unsigned char> >::_Vector_base (__a=..., __n=32768, this=0x7ffff63fe000) at /usr/include/c++/13/bits/stl_vector.h:332
#11 std::vector<unsigned char, RpmallocAllocator<unsigned char> >::vector (__n=32768, __a=..., this=0x7ffff63fe000) at /usr/include/c++/13/bits/stl_vector.h:554
#12 rapidgzip::deflate::DecodedData::getWindowAt (this=this@entry=0x7ffff0002c20, previousWindow=..., skipBytes=skipBytes@entry=2097152) at rapidgzip/DecodedData.hpp:413
#13 0x00007ffff69b91e1 in rapidgzip::GzipChunkFetcher<FetchingStrategy::FetchMultiStream, rapidgzip::ChunkData, false>::get (this=0x7ffff00010e0, offset=0) at /usr/include/c++/13/optional:306
#14 0x00007ffff69d3e41 in rapidgzip::ParallelGzipReader<rapidgzip::ChunkData, false>::read(std::function<void (std::shared_ptr<rapidgzip::ChunkData> const&, unsigned long, unsigned long)> const&, unsigned long) (this=this@entry=0xba1340, writeFunctor=..., nBytesToRead=nBytesToRead@entry=8) at rapidgzip/ParallelGzipReader.hpp:438
#15 0x00007ffff69d5044 in rapidgzip::ParallelGzipReader<rapidgzip::ChunkData, false>::read (this=0xba1340, outputFileDescriptor=-1, outputBuffer=<optimized out>, nBytesToRead=8)
    at rapidgzip/ParallelGzipReader.hpp:407
#16 0x00007ffff6904f12 in __pyx_pf_9rapidgzip_14_RapidgzipFile_14readinto (__pyx_v_bytes_like=<optimized out>, __pyx_v_self=0x7ffff747f110) at rapidgzip.cpp:14710
#17 __pyx_pw_9rapidgzip_14_RapidgzipFile_15readinto (__pyx_v_self=0x7ffff747f110, __pyx_args=<optimized out>, __pyx_nargs=1, __pyx_kwds=<optimized out>) at rapidgzip.cpp:14588
#18 0x000000000054e768 in ?? ()
#19 0x000000000054db12 in ?? ()
#20 0x0000000000511030 in ?? ()
#21 0x000000000054ce3f in PyObject_CallMethodObjArgs ()
#22 0x00000000006745ec in ?? ()
#23 0x000000000053308f in ?? ()
#24 0x0000000000505850 in PyObject_Vectorcall ()
#25 0x00000000004f627a in _PyEval_EvalFrameDefault ()
#26 0x0000000000525ed5 in _PyFunction_Vectorcall ()
#27 0x00000000004fa184 in _PyEval_EvalFrameDefault ()
#28 0x000000000054e402 in ?? ()
#29 0x000000000054db48 in ?? ()
#30 0x000000000063f714 in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
#31 0x000000000060d628 in ?? ()
#32 0x00007ffff7c97ada in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#33 0x00007ffff7d2847c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Please try reinstalling without rpmalloc support, like this:

export RAPIDGZIP_BUILD_RPMALLOC=disable; python3 -m pip install --force-reinstall 'git+https://github.com/mxmlnkn/indexed_bzip2.git@zlib-support#egginfo=rapidgzip&subdirectory=python/rapidgzip'

It works for me.

pdjstone commented 6 months ago

Confirmed - no crash for me when built without rpmalloc.

mxmlnkn commented 6 months ago

Should be fixed in 0.11.2.