Open jbpenrath opened 1 year ago
Here’s a simple and uncompressed PDF to reproduce the problem, in case you’d like to avoid installing another tool 😄: hello.pdf
The error is caused by the XRef table with /W [1 4 6]
. The third field is encoded using 6 bytes, and it’s decoded here using nunpack
that’s not designed to handle all integer sizes.
Instead of using struct.unpack
in nunpack
, it may be useful to use int.from_bytes
that will automatically work for all integer sizes.
fixed in #1029 (and thank you for weasyprint, it is very nice software!)
Bug report
Description
I'm generating PDF document through Weasyprint. Since the version 59.0 of this package, I'm not able to extract text from generated compressed PDF files with
pdfminer.highlevel.extract_text
method. Indeed this method raises aTypeError
, invalid length. The exception is raised from a util method called nunpack.So I first open an issue on the Weasyprint repository, but it appears the issue's source could be come from pdfminer itself.
You can take a look to the answer of Weasyprint maintainer, to understand pdfminer concern in this problem.
Steps to reproduce