Closed 1339503169 closed 1 day ago
Please directly provide a problem PDF. I cannot read Chinese and can thus not understand anything of what I am seeing when I follow your link.
I previously uploaded the problem file, but for some unknown reason, it was not uploaded successfully. Here is the problem file you need
https://helpx.adobe.com/acrobat/using/component-files-pdf-portfolio.html
This is an explanatory document about PDF Portfolio in English
Description of the bug
https://helpx.adobe.com/cn/acrobat/using/component-files-pdf-portfolio.html
[文件:合并发票pdf.pdf]
I encountered this problem when processing PDF format files. PDF package refers to a compressed file that integrates multiple PDFs into one PDF. The specific details can be found in the link I posted. Pymupdf seems to be unable to correctly parse this file and can only obtain cover information. Using Adobe Reader, you can see that it contains many files. Is there any good solution for this situation
How to reproduce the bug
import pymupdf
doc = pymupdf.open(file_path) page = doc.load_page(0) text =page.get_text()
PyMuPDF version
1.24.11
Operating system
Windows
Python version
3.9