py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.3k stars 1.41k forks source link

Cannot open pdf file with Adobe Acrobat Reader after transformation #1527

Closed musman920 closed 1 year ago

musman920 commented 1 year ago

I am trying to create a blank page. rotate and merge 1 pdf file on that page and append another page. The functionality seems to work but after creating the new pdf file It does not open in adobe Acrobat reader. It gives the error ''

Environment

Mac on intel chip python 3.10 pypdf 3.2.0

Code + PDF

This is a minimal, complete example that shows the issue from your answer of stackoverflow: (My code also includes Transformation)

from pypdf import PdfReader, PdfWriter, Transformation
from pypdf.generic import RectangleObject

reader = PdfReader("dhl.pdf")
writer = PdfWriter()

desired_width = 100
desired_height = 100
r = RectangleObject([0, 0, desired_width, desired_height])

for page in reader.pages[:10]:
    old_width = page.mediabox.width
    old_height = page.mediabox.height

    a1 = desired_width / old_width
    a2 = desired_height / old_height
    factor = min(a1, a2)

    new_width = float(old_width * factor)
    new_height = float(old_height * factor)

    dx = (desired_width - new_width) / 2
    dy = (desired_height - new_height) / 2
    op = Transformation().translate(tx=dx, ty=dy)

    page.scale_to(width=new_width, height=new_height)
    page.add_transformation(op)
    page.mediabox = r
    page.artbox = r
    page.cropbox = r
    page.bleedbox = r
    page.trimbox = r
    writer.add_page(page)

with open("foo.pdf", "wb") as fp:
    writer.write(fp)

Traceback

This is the complete Traceback I see when opening with acrobat reader dhl.pdf foo.pdf :

An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem.

MartinThoma commented 1 year ago

Interesting. I can open those files just fine. Can you confirm that you can open it e.g. with the Google chrome PDF viewer?

musman920 commented 1 year ago

thank you for the response. it is opening in browser and xpdf viewer. Only problem is opening in adobe acrobat reader. Our organization has multiple computers and we have to deal with pdf alot. The standard is Adobe Acrobat Reader and It should be open in Acrobat Reader. it says "An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem.". What could be the issue here?

MartinThoma commented 1 year ago

I have no idea.

A first thing to check is if the PDFs are actually non-standard compliant. will do that after work.

Which version of Adobe Acrobat Reader do you use? (If it's not the latest, please upgrade + check again)

It could also be a bug in Acorbat Reader

musman920 commented 1 year ago

That would be greate if you can check. I have the latest acrobat reader and tested on multiple systems. The result is the same error on all systems. This is the sample generated pdf with test data.

MartinThoma commented 1 year ago

I have the latest acrobat reader and tested on multiple systems.

Which version do you have?

musman920 commented 1 year ago

adobe Acrobat Reader Version 2022.003.20281

Checked for updates again, it states it is already updated.

gruße

MartinThoma commented 1 year ago

VeraPDF fails, but 3heights doesn't.

That means the PDF is not fully standard compliant, but not so badly broken that it should matter.

Which means that I don't know how to continue

MartinThoma commented 1 year ago

image

mrknwk commented 1 year ago

The problem is the excessive intrinsic precision addressed in #1376, I guess.

musman920 commented 1 year ago

this indeed fixes the issue :) Cheers ! Just need to modify pypdf/generic/_base.py#L353 return f"{self:f}".rstrip("0") -> return f"{self:.19f}".rstrip("0") Now acrobat reader can open the transformed file.