py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.45k stars 1.42k forks source link

Blank page after merge. #2260

Open morkai opened 1 year ago

morkai commented 1 year ago

I'm trying to add a short text to each page of an existing document. The input document I'm using for testing has 18 pages and after running it through the specified script two pages containing only text are blank.

I've got permission to share the first three pages of the document (one of the failing pages is the page number 3).

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Windows-10-10.0.22621-SP0

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.16.4, crypt_provider=('pycryptodome', '3.10.1'), PIL=9.4.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfWriter, PdfReader, Transformation

input_pdf = PdfReader(open("blank_merge.input.pdf", "rb"))
output_pdf = PdfWriter()

for page_no, input_page in enumerate(input_pdf.pages):
    print(page_no + 1)
    footer_pdf = PdfReader(open("blank_merge.footer.pdf", "rb"))
    footer_page = footer_pdf.pages[0]
    footer_page.add_transformation(Transformation().rotate(0).translate(tx=0, ty=0))
    input_page.merge_page(footer_page)
    output_pdf.add_page(input_page)

outputStream = open("blank_merge.output.pdf", "wb")
output_pdf.write(outputStream)
outputStream.close()

In the above script we are merging the footer_page into the input_page, but if we swap the order and merge the input_page into the footer_page everything works (but is noticably slower):

from pypdf import PdfWriter, PdfReader, Transformation

input_pdf = PdfReader(open("blank_merge.input.pdf", "rb"))
output_pdf = PdfWriter()

for page_no, input_page in enumerate(input_pdf.pages):
    print(page_no + 1)
    footer_pdf = PdfReader(open("blank_merge.footer.pdf", "rb"))
    footer_page = footer_pdf.pages[0]
    footer_page.add_transformation(Transformation().rotate(0).translate(tx=0, ty=0))
    footer_page.merge_page(input_page)
    output_pdf.add_page(footer_page)

outputStream = open("blank_merge.output.pdf", "wb")
output_pdf.write(outputStream)
outputStream.close()

blank_merge.input.pdf - the input PDF - notice that the 3rd page is all text blank_merge.footer.pdf - this is created through Chrome's 'Print to PDF' feature (PDF created with the reportlab package also doesn't work) blank_merge.output.pdf - the result of merging the footer into the input - notice that the 3rd page is blank blank_merge.output_reversed.pdf - the result of merging the input into the footer - notice that the 3rd page has the original contents and the extra footer

stefan6419846 commented 1 year ago

I am not completely sure why, but the following approach seems to work (id est using the clone_from version):

from pypdf import PdfReader, PdfWriter, Transformation

writer = PdfWriter(clone_from="blank_merge.input.pdf")

for page_no, page in enumerate(writer.pages):
    print(page_no + 1)
    footer_pdf = PdfReader("blank_merge.footer.pdf")
    footer_page = footer_pdf.pages[0]
    footer_page.add_transformation(Transformation().rotate(0).translate(tx=0, ty=0))
    page.merge_page(footer_page)

writer.write("blank_merge.output.pdf")
bigatti commented 1 year ago

@morkai Could you test the same PDF, but using pypdf version 3.10.0 ?

I have the same problem, after version 3.10.0 there is a problem.

Compare 3.10.0 to 3.11.0

morkai commented 1 year ago

@bigatti Yes, it works in 3.10.0 and breaks from 3.11.0.