pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.53k stars 447 forks source link

There is a bug in the add_redact_annot() function #3646

Open lancer92-rep opened 2 days ago

lancer92-rep commented 2 days ago

Description of the bug

When the PDF created after deleting text by applying the add_redact_annot() function is opened in a web browser, the shapes are not displayed correctly.

How to reproduce the bug

    doc = pymupdf.open(pdf_file)
    for page in doc:
        blks = page.get_text("blocks", sort=True, flags=pymupdf.TEXTFLAGS_DICT)
        for blk in blks:
            rect = pymupdf.Rect(blk[0], blk[1], blk[2], blk[3])
            page.add_redact_annot(rect)
        page.apply_redactions(images=0, graphics=0)
    file_name, file_extension = os.path.splitext(pdf_file)
    new_file = f"{file_name}_o{file_extension}"
    doc.subset_fonts()
    doc.ez_save(new_file, garbage=4)
    doc.close()

Some of the information in the shapes is clustered at the bottom left of the page. image If you repair it using some repair tools, the file will display normally in the browser.

The original file is as follows. sample_A.pdf

This is the file before applying add_redact_annot(). sample_A_red.pdf

The final file is as follows. sample_A_o.pdf

PyMuPDF version

1.24.7

Operating system

Windows

Python version

3.10

JorjMcKie commented 2 days ago

Duplicate of #3630.