Closed lpozo closed 2 years ago
Hi there,
Can you please attach an example pdf that triggers the RecursionError
?
I have a feeling this may be the real underlying problem.
Kind regards, Joris Schellekens
Ohh, unfortunately, I don't think I can share the PDF legally, according to the copyright. It's a book by Manning Publications called: The quick Python book. Second Ed. Here are some of its properties:
The proporties itself don't really tell me much about why the problem is occurring.
I already have a test in my test-suite that attempts to read the meta-information of more than 1000 pdf documents.
My test-repository can be found here: https://github.com/jorisschellekens/pdf-corpus
So I'm pretty confident borb
can actually do this. That's why I'd like your exact document. To see how it's different from the documents I'm already testing against.
Perhaps you can find a similar, non-copyrighted work?
Kind regards, Joris Schellekens
Hi there,
This issue has been open for a week now. If you can not provide me with an input document that reproduces the problem, then I can't help you.
I am going to close this ticket as "can not reproduce". If at some point you do find a document that you can share, you are welcome to re-open the ticket.
Kind regards, Joris Schellekens
RecursionError
. The worst issue is that most of the time the only metadata we get isNone
, even when a regular PDF reader correctly shows the meta-info.borb
Version(s): Version: 2.0.18This code generates a really long traceback with a
RecursionError
pointing out toDictionary.add_base_methods()
. When the target PDF gets successfully read, I getNone
as the author's info.Doing a similar operation with
PyPDF4
takes milliseconds. However, apparently, this library isn't actively maintained.I've noticed that reading PDF files created with
borb
works correctly and faster. But, I assume that most of the time we work with PDFs created by other tools.