Open SteveHawk opened 1 week ago
Thank you for submitting this.
This happens inside the base library MuPDF. I am going to transfer the issue to the team for investigation.
MuPDF issue reference https://bugs.ghostscript.com/show_bug.cgi?id=707835
@JorjMcKie Thanks!
Description of the bug
Since v1.24.1 introduced
use_objstms
option inDocument.save()
, settinguse_objstms=1
andlinear=True
together doesn't work on some documents, results in a broken PDF file. On version >= 1.24.3, some documents even cause the program to crash.How to reproduce the bug
Here's a minimal reproducible program:
We ran into the problem when processing some internal documents, but managed to reproduce the issue on two random paper downloaded from arXiv. Here are the files:
1706.03762v7.pdf 2401.08541v1.pdf
When running the program, it spits out error logs like below during the pixmap generation, possibly due to the file is broken.
And the result PDF file is either blank or only contains some lines with no texts when opening in Ubuntu's Evince document viewer. Opening it in chrome does show texts, but the font is altered and figures are gone.
Also, it seems like turning on garbage collection affects the crash pattern, when using
ez_save
, the first file crashes the program, when usingsave
with no gc, the second file crashes the program. They all crash with such log:PyMuPDF version
1.24.5
Operating system
Linux
Python version
3.11