pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.49k stars 443 forks source link

`Document.pagemode` or `Document.pagelayout` crashes for epub files #3615

Closed arun-mani-j closed 4 days ago

arun-mani-j commented 5 days ago

Description of the bug

The program crashes with a segmentation fault when Document.pagemode or Document.pagelayout is called for epub files.

How to reproduce the bug

  1. Get an epub file. For example, I was able to reproduce the issue with this public domain book - https://www.gutenberg.org/ebooks/73910.epub3.images.
  2. Run the following code.
    
    import pymupdf

doc = pymupdf.open("test.epub") print(doc.pagemode) print(doc.pagelayout)

3. Program crashes.

This is the backtrace from core dump (`coredumpctl gdb`):
```c
#0  0x00007f33cc479170 in pdf_trailer ()
   from /home/arun-mani-j/Projects/aayra/lib/python3.11/site-packages/pymupdf/libmupdf.so.24.4
(gdb) bt
#0  0x00007f33cc479170 in pdf_trailer () from /home/arun-mani-j/Projects/aayra/lib/python3.11/site-packages/pymupdf/libmupdf.so.24.4
#1  0x00007f33cc19ac85 in mupdf::ll_pdf_trailer(pdf_document*) () from /home/arun-mani-j/Projects/aayra/lib/python3.11/site-packages/pymupdf/libmupdfcpp.so.24.4
#2  0x00007f33cc157c72 in mupdf::pdf_trailer(mupdf::PdfDocument const&) () from /home/arun-mani-j/Projects/aayra/lib/python3.11/site-packages/pymupdf/libmupdfcpp.so.24.4
#3  0x00007f33cab608c7 in _wrap_pdf_trailer () from /home/arun-mani-j/Projects/aayra/lib/python3.11/site-packages/pymupdf/_mupdf.so
#4  0x00000000004d5b8f in ?? ()
#5  0x0000000000482720 in PyObject_Vectorcall ()
#6  0x0000000000424bf0 in _PyEval_EvalFrameDefault ()
#7  0x00000000005835b8 in ?? ()
#8  0x0000000000482e16 in PyObject_CallOneArg ()
#9  0x00000000004dabbd in _PyObject_GenericGetAttrWithDict ()
#10 0x00000000004da176 in PyObject_GetAttr ()
#11 0x0000000000424192 in _PyEval_EvalFrameDefault ()
#12 0x0000000000583454 in PyEval_EvalCode ()
#13 0x00000000005cf631 in ?? ()
#14 0x00000000005d0ccf in _PyRun_SimpleFileObject ()
#15 0x00000000005d12d0 in _PyRun_AnyFileObject ()
#16 0x00000000005f29b0 in ?? ()
#17 0x00000000005f303e in Py_BytesMain ()
#18 0x00007f33ce808c8a in __libc_start_call_main (main=main@entry=0x420fc0, argc=argc@entry=2, argv=argv@entry=0x7ffc33065dd8) at ../sysdeps/nptl/libc_start_call_main.h:58
#19 0x00007f33ce808d45 in __libc_start_main_impl (main=0x420fc0, argc=2, argv=0x7ffc33065dd8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc33065dc8)
    at ../csu/libc-start.c:360
#20 0x000000000042c191 in _start ()

PyMuPDF version

1.24.5

Operating system

Linux

Python version

3.11

arun-mani-j commented 5 days ago

('1.24.5', '1.24.4', '20240530000001') is the version tuple I get from pymupdf.version. (pymupdf.versionBind results in attribute error).

JorjMcKie commented 5 days ago

These functions / properties only make sense for PDF. So an assertion is missing.

julian-smith-artifex-com commented 4 days ago

Thanks for reporting this, we have a fix.

julian-smith-artifex-com commented 4 days ago

Fixed in 1.24.7.

arun-mani-j commented 4 days ago

Thanks for the quick fix!

drworm commented 4 days ago

I still get the problem in 1.24.7.

I'm back on PyMuPDF==1.24.5 where it doesn't have this issue

julian-smith-artifex-com commented 4 days ago

Interesting. Are you still seeing a segv, or is it just a Python exception?

julian-smith-artifex-com commented 4 days ago

[1.24.7 has a test that exactly replicates the original reproducer, so i'm fairly confident it fixes this particular scenario.]

drworm commented 4 days ago

Sorry my bad, didn't read the exact description of this issue properly. It's another unrelated issues.

julian-smith-artifex-com commented 3 days ago

Good to know, thanks. Please create a new Github issue for the problem you're seeing.