metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.05k stars 115 forks source link

AttributeError: 'NoneType' object has no attribute 'findall' #27

Closed sdwarwick closed 3 years ago

sdwarwick commented 6 years ago

complete traceback is:

...\lib\site-packages\pdfx\libs\xmp.py", line 50, in meta  
    for desc in self.rdftree.findall(RDF_NS+'Description'):
AttributeError: 'NoneType' object has no attribute 'findall'

Have folks seen this error on some pdfs?

this is a remote file, addressed through http://

I cannot publish the location of this particular file here, but would appreciate a potential strategy for a solution to this problem!

jorgelopezlago commented 5 years ago

I experience the same issue with local pdf files. I am using Python 2.7.15rc1.

habere-et-dispertire commented 5 years ago

Same issue with local pdf files. Under Python 2.7.16, macOS 10.13.6.

metachris commented 3 years ago

Please try again with v1.4.1, and reopen the issue if the issue persists. Thanks 🙏