pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.18k stars 496 forks source link

Facing Issues after applying redactions they delete some Image or Icons #3439

Closed aleem75321 closed 4 months ago

aleem75321 commented 4 months ago

Description of the bug

I am trying to change the colour of the dot in the PDF. For this, I have applied redaction on the old bindi and added the new bindi with the desired colour.

Problem:-

ScreenShort:-

1. . Problem Delete object

screenshot before applying redaction

test_Original_image

screenshot after applying redaction (I have circled the area where the change in the Images has been deleted)

test_after_redacttion

2. Problem Image distribution screenshot before applying redaction (I have circled the area where the change the Image)

test2_Original_image

screenshot after applying redaction (I have circled the area where the change the Image)

test2_after_redacttion

3. . Problem text distribution

screenshot before applying redaction (I have circled the area where the change the text )

test2_Original_text_issue

screenshot after applying redaction (I have circled the area where the change the text )

test2_after_redact_text_issue

How to reproduce the bug

import fitz
from pathlib import Path

file_path=Path(r"test_pages/test.pdf")
# file_path=Path(r"out/MT#MTMumbaiBS#18-04-2024#Mumbai#1#MTM9#4.pdf")
doc=fitz.open(file_path)
page=doc[0]

blocks=page.get_text("rawdict",flags=fitz.TEXTFLAGS_TEXT,sort=True)["blocks"]  
#Set Colour for outoput PDF
Red = fitz.pdfcolor["red"]

for b in  blocks:
    for l in b["lines"]:  
        for s in l["spans"]:
            for c in s["chars"]:
                if s["size"]>15 and s['color']==2236191: 
                    if c['c']== "ं":
                        try:
                            font = fitz.Font(fontname=s['font'],fontfile=f"{s['font']}.ttf")  
                        except Exception as e:
                            print(str(e))  
                        redact_box = fitz.Rect(c["bbox"]) 
                        origin_text = fitz.Point(c["origin"]) 
                        redact_box.y1 = redact_box.y1-s['size'] 
                        page.add_redact_annot(redact_box) 
                        # Apply reactions after all text replacements
                        page.apply_redactions(0,0,0)

                        # Create Text writer to Write in Page with choose Color
                        tw = fitz.TextWriter(page.rect,color=Red)  
                        #re-insert same text - different color
                        tw.append((origin_text.x,origin_text.y), text=c['c'],fontsize=s['size'],font=font)
                        tw.write_text(page) 

#Saving Backup File furture use 
out_fpath="OUT/"+file_path.stem+".pdf"
doc.save(out_fpath,garbage=3, deflate=True)
doc.close()

Original PDFS:-

test.pdf test2.pdf

After applying PDFS:- test_after_apply_redaction.pdf test2_after_apply_redaction.pdf

PyMuPDF version

1.24.2

Operating system

Windows

Python version

3.11

JorjMcKie commented 4 months ago

Sorry, what I meant was this:

  1. add redaction annotations as desired
  2. save as a new PDF without applying redactions!
  3. open the just saved PDF and apply redactions.
  4. save the result as another PDF.
  5. only if PDF from step 4 shows more deletions than the crossed-out red rectangles of PDF from step 2 indicate, we have a potential problem.
aleem75321 commented 4 months ago

As you said above, I have done all the steps and found that there is no change in the PDF even though they are not deleting the crossed-out red rectangle

It is interesting when I apply redaction to code some changes in pdf like the dot colour and one image colour also. but if follow the above steps nothing changes

MY question:- According to the documentation if we set the Image, Graphic and Text values to zero it should ignore any overlapping pixels. So why doesn't that parameter work?

test2_Redacted.pdf test2.pdf

JorjMcKie commented 4 months ago

We are talking past each other! What I said was: generate a PDF with crossed out redaction rectangles. Then apply redactions and look at the result. Your two most recent examples look equal. What is this telling me?

Previously you said that more stuff was removed than intended. So what is the relationship here?

JorjMcKie commented 4 months ago

I am moving this to Discussions. There are so far no indications that a bug has been encountered. When necessary, we can open an issue any time.