py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.41k stars 1.42k forks source link

Images merged between pages #2923

Closed pprados closed 4 weeks ago

pprados commented 4 weeks ago

For some files, all pages have the same images, even if they're only on one page.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
# Linux-6.8.0-47-generic-x86_64-with-glibc2.39

$ python -c "import pypdf;print(pypdf._debug_versions)"
# pypdf==5.0.1, crypt_provider=('cryptography', '43.0.3'), PIL=10.4.0

Code + PDF

This is a minimal, complete example that shows the issue:

reader=pypdf.PdfReader("
[pdf-test.pdf](https://github.com/user-attachments/files/17506941/pdf-test.pdf)
pdf-test.pdf")
[len(page.images) for page in reader.pages]
Out[16]: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

Share here the PDF file(s) that cause the issue. The smaller they are, the better. Let us know if we may add them to our tests!

pdf-test.pdf

stefan6419846 commented 4 weeks ago

Thanks for your issue. This is a common issue with PDF files generated by LibreOffice and has been reported previously as well: #2823, #2536. I am going to close this issue - please try to search for existing issues and discussions before opening a new issue.