pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.66k stars 528 forks source link

Redacting incorrectly modifies appearance #3872

Closed reggieag closed 2 months ago

reggieag commented 2 months ago

Description of the bug

I'm adding a redact annot manually to the PDF to remove some strike-through text. When I apply the redaction, the PDF appearance is mutated incorrectly. A whole margin of text is removed from the document instead of just the text in the annotation.

Interestingly, only the appearance is incorrectly impacted. The text is still there.

Possibly related to https://github.com/pymupdf/PyMuPDF/issues/3376 Screenshot 2024-09-17 at 4 45 16 PM

Screenshot 2024-09-17 at 4 45 07 PM

How to reproduce the bug

example_pdf_before_redaction.pdf example_pdf_after_redaction.pdf

You can run the example_pdf_before_redaction.pdf through the below script on PyMuPdf==1.24.10 and Python==3.9.19 to reproduce the bug.

import os

from pymupdf import TOOLS
import pymupdf

CURRENT_DIR = os.getcwd()
FILE_DIR = f"{CURRENT_DIR}/[INSERT LOCAL PATH HERE]"

TOOLS.set_small_glyph_heights(True)
with pymupdf.open(FILE_DIR) as doc:  # open document
    page_1 = doc[0]
    page_1.add_redact_annot(pymupdf.Rect(294.421875, 401.25, 342.3515625, 402.0))
    page_1.apply_redactions()

    doc.save(FILE_DIR.replace(".pdf", "_redacted.pdf"))

pip freeze output

Using python3.9 (3.9.19)
fire==0.6.0
fonttools==4.53.1
lxml==5.3.0
numpy==2.0.2
opencv-python-headless==4.10.0.84
pdf2docx==0.5.8
PyMuPDF==1.24.10
pymupdf4llm==0.0.14
PyMuPDFb==1.24.10
python-docx==1.1.2
six==1.16.0
termcolor==2.4.0
typing_extensions==4.12.2

I'm running this locally on a mac.

PyMuPDF version

1.24.10

Operating system

MacOS

Python version

3.9

JorjMcKie commented 2 months ago

My result looks like this image which is perfectly as it should be. The evacuated space also contains no text anymore.

reggieag commented 2 months ago

Weird, I guess it might be something with my local setup then... Do you have any recommendations on libraries to update? I'm using PyMuPDF==1.24.10 and PyMuPDFb==1.24.10.

reggieag commented 2 months ago

Oh, when I open the doc in chrome it's fine. The issue must be a bug in the mac preview app.