pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.54k stars 447 forks source link

get_pixmap function takes too long to process #3450

Open anirudhagarwal1 opened 1 month ago

anirudhagarwal1 commented 1 month ago

Description of the bug

When trying to create a pixmap for a PDF file, get_pixmap function takes too long even though the page size is well within the required limits.

pdf_in = fitz.open(stream=io.BytesIO(initial_bytes=pdf_bytes), filetype="pdf")
page = pdf_in[0]
pix = page.get_pixmap(alpha=False, dpi=150)

The step get_pixmap takes about 62 seconds to process on my macbook pro. Wanted to understand the reason why this takes so long and how can I bring down this time.

How to reproduce the bug

temp-6.pdf This is the PDF file that I am using.

PyMuPDF version

1.24.2

Operating system

MacOS

Python version

3.10

JorjMcKie commented 1 month ago

Observation confirmed. will submit a MuPDF bug report.

JorjMcKie commented 1 month ago

MuPDF bug report link: https://bugs.ghostscript.com/show_bug.cgi?id=707777

anirudhagarwal1 commented 1 month ago

Do you have an idea about usually how much is the TAT for such issues?

JorjMcKie commented 1 month ago

These issues usually are solved rapidly within PyMuPDF as well as MuPDF. Of course there may be stubborn cases ... You will have to wait for a new release anyway though.

cfcurtis commented 1 week ago

I'm glad a fix is in progress, I came here to report this. Interestingly it seems to be a regression - I updated PyMuPDF (from 1.21.1) and immediately noticed that it was much slower.