metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.03k stars 113 forks source link

Some PDFs don't work #1

Closed metachris closed 8 years ago

metachris commented 8 years ago

TODO:

Example error message:

$ pdfx xhyve\ –\ Lightweight\ Virtualization\ on\ OS\ X\ Based\ on\ bhyve\ _\ pagetable.pdf
Traceback (most recent call last):
  File "/usr/local/bin/pdfx", line 9, in <module>
load_entry_point('pdfx==1.0.1', 'console_scripts', 'pdfx')()
File "build/bdist.macosx-10.10-x86_64/egg/pdfx/cli.py", line 66, in main
File "build/bdist.macosx-10.10-x86_64/egg/pdfx/__init__.py", line 137, in __init__
AttributeError: 'NoneType' object has no attribute 'items'