pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.49k stars 443 forks source link

page.apply_redactions gives unwanted black rectangle #3630

Open CH-Tam opened 4 days ago

CH-Tam commented 4 days ago

Description of the bug

I want to remove all texts and only keep vector graphics (such as straight lines) in PDF, the code and result are shown below.

However, the original PDF does not contain that black rectangle, the output is undesired. What should I do? (the PDF file is non-public, but it can be provided though private channels.)

Both 1.24.3 and 1.24.7 have this error.

How to reproduce the bug

        doc = fitz.open(pdf_path)
                for page in doc:
                    page.add_redact_annot(page.rect + (-600, -600, 600, 600))
                    page.apply_redactions(graphics=0)

The result of 1st page is image

PyMuPDF version

1.24.7

Operating system

Linux

Python version

3.10

JorjMcKie commented 4 days ago

Please send me the file via e-mail, thanks.

JorjMcKie commented 4 days ago

@CH-Tam - I received nothing so far ... did you use jorj.x.mckie@outlook.de?

JorjMcKie commented 4 days ago

Thanks - file received. This is a problem in the base library for which a fix has been developed. I tested it locally with the improvements and confirmed that it works.

The fix works with MuPDF v1.25.0. (the development version on GitHub).