py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.37k stars 1.41k forks source link

`PageObject.merge_page` blanks 2nd page in signed PDF #2306

Closed not-my-profile closed 11 months ago

not-my-profile commented 11 months ago
import pypdf

bill = pypdf.PdfReader('bill.pdf')
watermark_page = pypdf.PdfReader('watermark.pdf').pages[0]

out = pypdf.PdfWriter()
for page_idx in range(len(bill.pages)):
    page = bill.pages[page_idx]
    page.merge_page(watermark_page)
    out.add_page(page)

out.write('out.pdf')

Executing the above code with these two files: bill.pdf and watermark.pdf results in the second page of the generated out.pdf to be completely blank (instead of containing the content of the 2nd page of bill.pdf with the watermark overlayed).

This bug was apparently introduced in version 3.11.0 with fca29c7ef692b73080a72f3e80bb131b1d47b904, which was part of #1906 to fix #1897. The bug is still present in the latest version as well as the main branch.

Edit: This appears to be a duplicate of #2260.

stefan6419846 commented 11 months ago

Yes, this seems to be the same as in #2260, and

import pypdf

bill = pypdf.PdfReader('bill.pdf')
watermark_page = pypdf.PdfReader('watermark.pdf').pages[0]

out = pypdf.PdfWriter(clone_from=bill)
for page in out.pages:
    page.merge_page(watermark_page)

out.write('out.pdf')

seems to fix this.

not-my-profile commented 11 months ago

@stefan6419846 Thanks for the quick response! I think you meant for page in bill.pages ... but even with that change your code doesn't appear to work since it doesn't add the watermark.

stefan6419846 commented 11 months ago

No, I mean it exactly like I have written it above - the relevant part is to use clone_from with either a PdfReader or path and work on this PdfWriter instead of adding pages from the reader to the writer manually.

not-my-profile commented 11 months ago

Oh ... my bad ... it does indeed work ... I was just testing it still with 3.11.0 where your code snippet results in an entirely blank PDF ... with the latest version it does indeed work as intended. Thanks!