pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.52k stars 446 forks source link

PyMuPDF apply_redactions crops parts of the PDF in the final output #3562

Closed cajmorgan closed 1 week ago

cajmorgan commented 3 weeks ago

Description of the bug

Hello!

There exists a bug in version 1.24.5. When you use the page.apply_redactions method, the page gets cropped. In version 1.23.5, the result is as expected. See screenshots below

version1 23 5 version1 24 5

How to reproduce the bug

I haven't tested many different documents, but for a specific one, I added some boxes using page.add_redact_annot and then called page.apply_redactions()

PyMuPDF version

1.24.5

Operating system

MacOS

Python version

3.10

cajmorgan commented 3 weeks ago

The bug seems to appear in version 1.24.0 after some testing

JorjMcKie commented 3 weeks ago

Please provide reproducing file. So far, we would be forced to guessing.

cajmorgan commented 3 weeks ago

Of course, sorry. This is one page of the doc. I justed tested it and the error happens, though as mention, only after version 1.23.9 it seems

error-doc.pdf

JorjMcKie commented 1 week ago

Sorry for the delay. You haven't supplied reproducing code, so I have tried the following, which does not reveal problem behavior (version 1.24.5):


for r in page.search_for("genocide"):
    page.add_redact_annot(r)

'Redact' annotation on page 0 of error-doc.pdf
page.apply_redactions()
True
doc.ez_save("redacted.pdf")
pymupdf.version
('1.24.5', '1.24.2', '20240530000001')
cajmorgan commented 1 week ago

Of course,


boxes = [some boxes here]
 pdf = pymupdf.open(doc_path)

        for i, page in enumerate(pdf):

            for box in boxes

                rect = pymupdf.Rect(
                    box[0],
                    box[1],
                    box[2],
                    box[3],
                )
                fontsize = 12

                page.add_redact_annot(
                    rect,
                    text="[REDACTED]",
                    text_color=(0, 0, 0),
                    fill=(1, 0, 0),
                    fontsize=fontsize,
                    align=pymupdf.TEXT_ALIGN_CENTER,
                )

            page.apply_redactions()

        path = "somepath"

        pdf.save(path)
cajmorgan commented 1 week ago

There was a problem with a newer version with .TEXT_ALIGN_CENTER, it may be related to that or something else.

JorjMcKie commented 1 week ago

Ok, I have tried all sorts of things with the file but no error was showing up.

So closing this as resolved now. Please feel free to re-open with better data or submit a new issue.