pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.66k stars 528 forks source link

Very slow opening particular pdf #3882

Closed ml-oliver closed 1 month ago

ml-oliver commented 1 month ago

Description of the bug

Opening the attached pdf with: pymupdf.open("Appendix-I-Capital-Program-Project-Pages.pdf") takes a very long time (around 90s) and blocks main thread.

File in filebin (will expire after 6 days... its 30Mb so too big to attach here): https://filebin.net/75p9by6n1nusojq9/Appendix-I-Capital-Program-Project-Pages.pdf

Original source of file: https://www.wmata.com/initiatives/capital-improvement-program/upload/Appendix-I-Capital-Program-Project-Pages.pdf

Can share file by email if required for testing.

How to reproduce the bug

As above pymupdf.open("Appendix-I-Capital-Program-Project-Pages.pdf") is sufficient to reproduce this issue.

Edit

Can also reproduce this issue with mupdf directly, mupdf Appendix-I-Capital-Program-Project-Pages.pdf so likely it is an upstream issue..

PyMuPDF version

1.24.10

Operating system

Linux

Python version

3.11

JorjMcKie commented 1 month ago

Can also reproduce this issue with mupdf directly, mupdf Appendix-I-Capital-Program-Project-Pages.pdf so likely it is an upstream issue..

Thanks for the additional comment. Yes, please do so - your insight is quite right ...