Closed lyon-tonic closed 1 week ago
Inserting / Adding stuff to rotated pages can be confusing. For most methods in PyMuPDF you must pass rotated coordinates (for points, rectangles, ...) to get them in the right place. I think this script does what you want:
import pymupdf as fitz # PyMuPDF
RED = fitz.pdfcolor["red"]
def process_pdf(input_pdf_path, output_pdf_path):
# Open the input PDF file
document = fitz.open(input_pdf_path)
# Iterate through each page
for page in document:
# 234 is half of the width of the page
rect = fitz.Rect(0, 0, 234, 234)
rot_rect = rect * page.derotation_matrix
redact_annot = page.add_redact_annot(
rot_rect, text=f"{page.number=}", text_color=RED
)
page.apply_redactions()
document.ez_save(output_pdf_path)
if __name__ == "__main__":
input_pdf_path = "input.pdf" # Replace with the path to your input PDF
output_pdf_path = "output.pdf" # Replace with the path to your output PDF
process_pdf(input_pdf_path, output_pdf_path)
print(f"Processed PDF saved to {output_pdf_path}")
Thanks for responding!
This is part of the issue, but it is still not solving the issue of the redact_annot fill. The fill rectangle appears to be rendering separately from the redact_annot, and I'm not sure why.
The black fill rect is not showing up here.
import pymupdf as fitz # PyMuPDF
RED = fitz.pdfcolor["red"]
def process_pdf(input_pdf_path, output_pdf_path):
# Open the input PDF file
document = fitz.open(input_pdf_path)
# Iterate through each page
for page in document:
# 234 is half of the width of the page
rect = fitz.Rect(0, 0, 234, 234)
rot_rect = rect * page.derotation_matrix
redact_annot = page.add_redact_annot(
rot_rect, text=f"{page.number=}", text_color=RED
)
redact_annot.update(fill_color=(0, 0, 0)) # set fill color to black
page.apply_redactions()
document.ez_save(output_pdf_path)
if __name__ == "__main__":
input_pdf_path = "input.pdf" # Replace with the path to your input PDF
output_pdf_path = "output.pdf" # Replace with the path to your output PDF
process_pdf(input_pdf_path, output_pdf_path)
print(f"Processed PDF saved to {output_pdf_path}")
This file indeed does a few unexpected things! Here is a complete solution that removes the page rotations.
import pymupdf as fitz # PyMuPDF
RED = fitz.pdfcolor["red"]
BLACK = fitz.pdfcolor["black"]
def process_pdf(input_pdf_path, output_pdf_path):
rect = fitz.Rect(0, 0, 234, 234)
# Open the input PDF file
src = fitz.open(input_pdf_path)
doc = fitz.open() # output file
# Iterate through each page
for src_page in src:
# the output PDF will contain pages with rotation 0
src_rect = src_page.rect
w, h = src_rect.br
src_rot = src_page.rotation
src_page.set_rotation(0)
# make output page having the visible dimension of the input
page = doc.new_page(width=w, height=h)
page.show_pdf_page( # insert source page
page.rect,
src,
src_page.number,
rotate=-src_rot, # reversed original rotation
)
# now we can redact in a worry-free manner
redact_annot = page.add_redact_annot(
rect, text=f"{page.number=}", text_color=RED, fill=BLACK
)
page.apply_redactions()
doc.ez_save(output_pdf_path)
if __name__ == "__main__":
input_pdf_path = "input.pdf" # Replace with the path to your input PDF
output_pdf_path = "output.pdf" # Replace with the path to your output PDF
process_pdf(input_pdf_path, output_pdf_path)
print(f"Processed PDF saved to {output_pdf_path}")
Close issue for lack of reaction.
Description of the bug
I am trying to redact words from a PDF, based on OCR-generated rectangles.
PyMuPdf has worked well for us, but I have run into a strange situation with a specific file that has some strange properties. (I've attached the file). The pages in this file are an abnormal size (8.5 x 6.5 in) and some of them are rotated.
I would like to have the coordinates in the rectangles relative to the top left, but even before I do that, I have noticed that the redacted rectangle is not in the same place as the fill.
If this is not a bug, I would like to understand why these appear to be being drawn on separate coordinate systems, and how to reconcile them.
How to reproduce the bug
This is a simple script that shows the problem in the files below:
Input: input.pdf
Output: output.pdf
PyMuPDF version
1.24.5
Operating system
Windows
Python version
3.11