Open NikolaiLyssogor opened 3 weeks ago
Apparently one of the further cases where we are dealing with an object reference instead of direct values. In theory, using x.get_object() > 0
should work here.
Thanks for the quick response. Adding
x = x.get_object() if isinstance(x, IndirectObject) else x
right before the line where the error is occurring solved the issue for me.
@NikolaiLyssogor you seem to be on an old version. Please upgrade to lastest version and retest
Tested again with 4.2.0
. The original issue still occurs. Also, the fix proposed above solves the issue in 4.2.0
, at least for my own documents I have been testing this on.
Can you confirm that just adding
x = x.get_object()
works
if you can you propose a PR on main branch?
It's working on my documents. There was also no change to which tests are passing in the test suite. I'll open a PR.
I'm trying to extract text from each page of a large number of PDFs. A few of them are giving me the issue shown in the traceback. This seems to be related to #2286.
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
The PDF that is causing this issue can't be shared because it contains sensitive information. However, here is the result of
reader.metadata
:I'm not the one creating the PDFs and unfortunately I haven't been able to reproduce the issue so that I can share it here.
Traceback
This is the complete traceback I see: