pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.75k stars 533 forks source link

Segmentation fault with method “ll_fz_lookup_metadata2” #4057

Closed thattemperature closed 1 week ago

thattemperature commented 1 week ago

Description of the bug

I use eaf-pdf-viewer in emacs. And It meets segmentation fault every time I open a pdf file. This is the trace back:

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
Downloading source file /build/gcc-14-ig5ci0/gcc-14-14.2.0/src/libstdc++-v3/../libgcc/unwind-pe.h...
read_encoded_value_with_base (encoding=encoding@entry=160 '\240', base=<optimized out>, p=0x7fffffffc929 "e\222\254\377\177", p@entry=0x7fffffffc921 "\001\240\001", val=val@entry=0x7fffffffc028) at /build/gcc-14-ig5ci0/gcc-14-14.2.0/src/libstdc++-v3/../libgcc/unwind-pe.h:286
warning: 286    /build/gcc-14-ig5ci0/gcc-14-14.2.0/src/libstdc++-v3/../libgcc/unwind-pe.h: No such file or directory
#0  read_encoded_value_with_base (encoding=encoding@entry=160 '\240', base=<optimized out>, p=0x7fffffffc929 "e\222\254\377\177", p@entry=0x7fffffffc921 "\001\240\001", val=val@entry=0x7fffffffc028) at /build/gcc-14-ig5ci0/gcc-14-14.2.0/src/libstdc++-v3/../libgcc/unwind-pe.h:286
#1  0x00007fffeaaba591 in read_encoded_value (context=0x7fffffffc4f0, encoding=160 '\240', p=0x7fffffffc921 "\001\240\001", val=0x7fffffffc028) at /build/gcc-14-ig5ci0/gcc-14-14.2.0/src/libstdc++-v3/../libgcc/unwind-pe.h:306
#2  parse_lsda_header (context=context@entry=0x7fffffffc4f0, p=0x7fffffffc921 "\001\240\001", p@entry=0x7fffffffc920 "\240\001\240\001", info=info@entry=0x7fffffffc020) at ../../../../src/libstdc++-v3/libsupc++/eh_personality.cc:60
#3  0x00007fffeaaba71c in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=2, exception_class=5138137972254386944, ue_header=0x1a001a0, context=0x7fffffffc4f0) at ../../../../src/libstdc++-v3/libsupc++/eh_personality.cc:454
#4  0x00007fffdc5351bd in _Unwind_Phase2 (context=0x7fffffffc4f0, exception_object=0x1a001a0) at unwind/unwind-internal.h:118
#5  _Unwind_Resume (exception_object=0x1a001a0) at unwind/Resume.c:37
#6  0x00007fffac9265bd in mupdf::ll_fz_lookup_metadata2(fz_document*, char const*) [clone .cold] () at /home/thattemperature/.local/lib/python3.12/site-packages/pymupdf/libmupdfcpp.so.24.10
#7  0x00007fffac94bf0d in mupdf::fz_lookup_metadata2(mupdf::FzDocument const&, char const*) () at /home/thattemperature/.local/lib/python3.12/site-packages/pymupdf/libmupdfcpp.so.24.10
#8  0x00007fffa4b9cf06 in _wrap_fz_lookup_metadata2 () at /home/thattemperature/.local/lib/python3.12/site-packages/pymupdf/_mupdf.so
#9  0x0000000000549d7c in cfunction_call (func=0x7fffac68a200, args=0x7fffac4e40c0, kwargs=0x0) at /usr/local/src/conda/python-3.12.7/Objects/methodobject.c:548
#10 0x000000000051af9b in _PyObject_MakeTpCall (tstate=0x9bfb70 <_PyRuntime+458992>, callable=0x7fffac68a200, args=<optimized out>, nargs=<optimized out>, keywords=0x0) at /usr/local/src/conda/python-3.12.7/Objects/call.c:240
#11 0x0000000000525903 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x9bfb70 <_PyRuntime+458992>, frame=0x7ffff7fb2730, frame@entry=0x7ffff7fb2490, throwflag=throwflag@entry=0) at Python/bytecodes.c:2715
#12 0x000000000051db07 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb2490, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Include/internal/pycore_ceval.h:89
#13 _PyEval_Vector (kwnames=0x0, argcount=<optimized out>, args=0x7fffffffcce0, locals=0x0, func=0x7fffac034360, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Python/ceval.c:1683
#14 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, stack=0x7fffffffcce0, func=0x7fffac034360) at /usr/local/src/conda/python-3.12.7/Objects/call.c:419
#15 _PyObject_FastCallDictTstate (tstate=<optimized out>, callable=0x7fffac034360, args=0x7fffffffcce0, nargsf=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.12.7/Objects/call.c:133
#16 0x0000000000557944 in _PyObject_Call_Prepend (kwargs=0x0, args=0x7fffa452c250, obj=0x7fffa4548230, callable=0x7fffac034360, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Objects/call.c:508
#17 slot_tp_init (self=0x7fffa4548230, args=0x7fffa452c250, kwds=0x0) at /usr/local/src/conda/python-3.12.7/Objects/typeobject.c:9026
#18 0x000000000051af6b in type_call (kwds=0x0, args=0x7fffa452c250, type=<optimized out>) at /usr/local/src/conda/python-3.12.7/Objects/typeobject.c:1679
#19 _PyObject_MakeTpCall (tstate=0x9bfb70 <_PyRuntime+458992>, callable=0x179fa10, args=<optimized out>, nargs=<optimized out>, keywords=0x0) at /usr/local/src/conda/python-3.12.7/Objects/call.c:240
#20 0x0000000000525903 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x9bfb70 <_PyRuntime+458992>, frame=0x7ffff7fb2410, frame@entry=0x7ffff7fb2338, throwflag=throwflag@entry=0) at Python/bytecodes.c:2715
#21 0x000000000051db07 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb2338, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Include/internal/pycore_ceval.h:89
#22 _PyEval_Vector (kwnames=0x0, argcount=<optimized out>, args=0x7fffa4527870, locals=0x0, func=0x7fffa45149a0, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Python/ceval.c:1683
#23 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, stack=0x7fffa4527870, func=0x7fffa45149a0) at /usr/local/src/conda/python-3.12.7/Objects/call.c:419
#24 _PyObject_FastCallDictTstate (tstate=<optimized out>, callable=0x7fffa45149a0, args=0x7fffa4527870, nargsf=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.12.7/Objects/call.c:133
#25 0x00000000005579fc in _PyObject_Call_Prepend (kwargs=0x7fffffffd000, args=0x7fffc8208f90, obj=0x7fffa4523390, callable=0x7fffa45149a0, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Objects/call.c:508
#26 slot_tp_init (self=0x7fffa4523390, args=0x7fffc8208f90, kwds=0x7fffffffd000) at /usr/local/src/conda/python-3.12.7/Objects/typeobject.c:9026
#27 0x000000000051af6b in type_call (kwds=0x0, args=0x7fffc8208f90, type=<optimized out>) at /usr/local/src/conda/python-3.12.7/Objects/typeobject.c:1679
#28 _PyObject_MakeTpCall (tstate=0x9bfb70 <_PyRuntime+458992>, callable=0x19eeb50, args=<optimized out>, nargs=<optimized out>, keywords=0x0) at /usr/local/src/conda/python-3.12.7/Objects/call.c:240
#29 0x0000000000525903 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x9bfb70 <_PyRuntime+458992>, frame=frame@entry=0x7ffff7fb2280, throwflag=throwflag@entry=0) at Python/bytecodes.c:2715
#30 0x000000000051db07 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb2280, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Include/internal/pycore_ceval.h:89
#31 _PyEval_Vector (kwnames=0x0, argcount=<optimized out>, args=0x7fffffffd320, locals=0x0, func=0x7fffa4520900, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Python/ceval.c:1683
#32 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, stack=0x7fffffffd320, func=0x7fffa4520900) at /usr/local/src/conda/python-3.12.7/Objects/call.c:419
#33 _PyObject_FastCallDictTstate (tstate=<optimized out>, callable=0x7fffa4520900, args=0x7fffffffd320, nargsf=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.12.7/Objects/call.c:133
#34 0x0000000000557944 in _PyObject_Call_Prepend (kwargs=0x0, args=0x7fffc83a59c0, obj=0x7fffa45232f0, callable=0x7fffa4520900, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Objects/call.c:508
#35 slot_tp_init (self=0x7fffa45232f0, args=0x7fffc83a59c0, kwds=0x0) at /usr/local/src/conda/python-3.12.7/Objects/typeobject.c:9026
#36 0x000000000051af6b in type_call (kwds=0x0, args=0x7fffc83a59c0, type=<optimized out>) at /usr/local/src/conda/python-3.12.7/Objects/typeobject.c:1679
#37 _PyObject_MakeTpCall (tstate=0x9bfb70 <_PyRuntime+458992>, callable=0x16343d0, args=<optimized out>, nargs=<optimized out>, keywords=0x0) at /usr/local/src/conda/python-3.12.7/Objects/call.c:240
#38 0x0000000000525903 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x9bfb70 <_PyRuntime+458992>, frame=0x7ffff7fb21c8, frame@entry=0x7ffff7fb2090, throwflag=throwflag@entry=0) at Python/bytecodes.c:2715
#39 0x0000000000575717 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb2090, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Include/internal/pycore_ceval.h:89
#40 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, func=0x7fffe5731300, tstate=<optimized out>) at /usr/local/src/conda/python-3.12.7/Python/ceval.c:1683
#41 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=<optimized out>, func=0x7fffe5731300) at /usr/local/src/conda/python-3.12.7/Objects/call.c:419
#42 _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=0x7fffe5731300, tstate=0x9bfb70 <_PyRuntime+458992>) at /usr/local/src/conda/python-3.12.7/Include/internal/pycore_call.h:92
#43 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.12.7/Objects/classobject.c:91
#44 0x00007fffe642c6d0 in PyQtSlot::call(_object*, _object*) const () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/QtCore.abi3.so
#45 0x00007fffe642cb60 in PyQtSlot::invoke(void**, _object*, void*, bool) const () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/QtCore.abi3.so
#46 0x00007fffe642cdde in PyQtSlotProxy::unislot(void**) () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/QtCore.abi3.so
#47 0x00007fffe642ef37 in PyQtSlotProxy::qt_metacall(QMetaObject::Call, int, void**) () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/QtCore.abi3.so
#48 0x00007fffeaf8caec in QObject::event(QEvent*) () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/Qt6/lib/libQt6Core.so.6
#49 0x00007ffff637f152 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/Qt6/lib/libQt6Widgets.so.6
#50 0x00007fffe5bbc3d6 in sipQApplication::notify(QObject*, QEvent*) () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/QtWidgets.abi3.so
#51 0x00007fffeaf3d1ca in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/Qt6/lib/libQt6Core.so.6
#52 0x00007fffeaf402bd in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/Qt6/lib/libQt6Core.so.6
#53 0x00007fffeb1e6df3 in ??? () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/Qt6/lib/libQt6Core.so.6
#54 0x00007fffe96305b5 in g_main_dispatch (context=0x7fffe0000fb0) at ../../../glib/gmain.c:3344
#55 0x00007fffe968f717 in g_main_context_dispatch_unlocked (context=0x7fffe0000fb0) at ../../../glib/gmain.c:4152
#56 g_main_context_iterate_unlocked.isra.0 (context=context@entry=0x7fffe0000fb0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4217
#57 0x00007fffe962fa53 in g_main_context_iteration (context=0x7fffe0000fb0, may_block=1) at ../../../glib/gmain.c:4282
#58 0x00007fffeb1e677a in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/Qt6/lib/libQt6Core.so.6
#59 0x00007fffeaf4899b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/Qt6/lib/libQt6Core.so.6
#60 0x00007fffeaf451ce in QCoreApplication::exec() () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/Qt6/lib/libQt6Core.so.6
#61 0x00007fffe5a0911f in meth_QApplication_exec () at /home/thattemperature/.local/lib/python3.12/site-packages/PyQt6/QtWidgets.abi3.so
#62 0x0000000000549d7c in cfunction_call (func=0x7fffe5354e50, args=0x9621d0 <_PyRuntime+75600>, kwargs=0x0) at /usr/local/src/conda/python-3.12.7/Objects/methodobject.c:548
#63 0x000000000051af9b in _PyObject_MakeTpCall (tstate=0x9bfb70 <_PyRuntime+458992>, callable=0x7fffe5354e50, args=<optimized out>, nargs=<optimized out>, keywords=0x0) at /usr/local/src/conda/python-3.12.7/Objects/call.c:240
#64 0x0000000000525903 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=0x7ffff7fb2020, throwflag=<optimized out>) at Python/bytecodes.c:2715
#65 0x00000000005e3c6e in PyEval_EvalCode (co=co@entry=0xa87f10, globals=globals@entry=0x7ffff6df5600, locals=locals@entry=0x7ffff6df5600) at /usr/local/src/conda/python-3.12.7/Python/ceval.c:578
#66 0x000000000060a0b7 in run_eval_code_obj (tstate=tstate@entry=0x9bfb70 <_PyRuntime+458992>, co=co@entry=0xa87f10, globals=globals@entry=0x7ffff6df5600, locals=locals@entry=0x7ffff6df5600) at /usr/local/src/conda/python-3.12.7/Python/pythonrun.c:1722
#67 0x00000000006056d7 in run_mod (mod=mod@entry=0xb6b720, filename=filename@entry=0x7ffff6da2230, globals=globals@entry=0x7ffff6df5600, locals=locals@entry=0x7ffff6df5600, flags=flags@entry=0x7fffffffe0d0, arena=arena@entry=0x7ffff6d1bcb0) at /usr/local/src/conda/python-3.12.7/Python/pythonrun.c:1743
#68 0x000000000061d602 in pyrun_file (fp=fp@entry=0x9fe430, filename=filename@entry=0x7ffff6da2230, start=start@entry=257, globals=globals@entry=0x7ffff6df5600, locals=locals@entry=0x7ffff6df5600, closeit=closeit@entry=1, flags=0x7fffffffe0d0) at /usr/local/src/conda/python-3.12.7/Python/pythonrun.c:1643
#69 0x000000000061cf40 in _PyRun_SimpleFileObject (fp=0x9fe430, filename=0x7ffff6da2230, closeit=1, flags=0x7fffffffe0d0) at /usr/local/src/conda/python-3.12.7/Python/pythonrun.c:433
#70 0x000000000061cd33 in _PyRun_AnyFileObject (fp=0x9fe430, filename=filename@entry=0x7ffff6da2230, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffe0d0) at /usr/local/src/conda/python-3.12.7/Python/pythonrun.c:78
#71 0x0000000000615dc3 in pymain_run_file_obj (skip_source_first_line=0, filename=0x7ffff6da2230, program_name=0x7ffff6dfcd00) at /usr/local/src/conda/python-3.12.7/Modules/main.c:360
#72 pymain_run_file (config=0x962750 <_PyRuntime+77008>) at /usr/local/src/conda/python-3.12.7/Modules/main.c:379
#73 pymain_run_python (exitcode=0x7fffffffe0a4) at /usr/local/src/conda/python-3.12.7/Modules/main.c:633
#74 Py_RunMain () at /usr/local/src/conda/python-3.12.7/Modules/main.c:713
#75 0x00000000005cc5b9 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.12.7/Modules/main.c:767
#76 0x00007ffff7c2a1ca in __libc_start_call_main (main=main@entry=0x5cc4f0 <main>, argc=argc@entry=5, argv=argv@entry=0x7fffffffe338) at ../sysdeps/nptl/libc_start_call_main.h:58
#77 0x00007ffff7c2a28b in __libc_start_main_impl (main=0x5cc4f0 <main>, argc=5, argv=0x7fffffffe338, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe328) at ../csu/libc-start.c:360
#78 0x00000000005cc3e9 in _start ()

Process *eaf* finished

Maybe this problem is related to mupdf library?

How to reproduce the bug

My OS is Ubuntu 24.04. I am using anaconda with python version 3.12.7, and I install the pymupdf package with the command pip install --user PyMuPDF, which gets the newest version 1.24.13.

I try to test different version of pymupdf with command pip install --user PyMuPDF==<version>, and I found that the 1.23.8 works well but 1.23.9 or higher version will face the same problem. But since the file structure is different between different version, I cannot determine the critical difference between 1.23.8 and 1.23.9.

PyMuPDF version

1.24.13

Operating system

Linux

Python version

3.12

julian-smith-artifex-com commented 1 week ago

Please upload the PDF file to this issue, otherwise we cannot investigate the problem.

JorjMcKie commented 1 week ago

We have no dealings with that PDF viewer and cannot investigate what may go wrong inside it. You should probably contact that project and submit your issue there.

thattemperature commented 1 week ago

We have no dealings with that PDF viewer and cannot investigate what may go wrong inside it. You should probably contact that project and submit your issue there.

I have submitted a issue in that project and communicated with that author, but he thinks that the problem may be caused by the conflicts between pymupdf and mupdf library. @JorjMcKie

thattemperature commented 1 week ago

Please upload the PDF file to this issue, otherwise we cannot investigate the problem.

Untitled 1.pdf I have tested with several pdf files and meet same problem. One file is the above one, just an empty pdf file. @julian-smith-artifex-com

JorjMcKie commented 1 week ago

We have no dealings with that PDF viewer and cannot investigate what may go wrong inside it. You should probably contact that project and submit your issue there.

I have submitted a issue in that project and communicated with that author, but he thinks that the problem may be caused by the conflicts between pymupdf and mupdf library. @JorjMcKie

There are no such conflicts. All the market leading PDF viewers have no problem with the example file, including not only browser-based PDF viewers but also all the market-relevant MuPDF, Adobe Acrobat, Nitro PDF, PDF XChange, evince, SumatraPDF, Foxit Reader, evince (Linux standard viewer), what have you. The problem is certainly located in that viewer and must be fixed there.

JorjMcKie commented 1 week ago

Loading the document with PyMuPDF and looking at the metadata works as well.