metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.05k stars 115 forks source link

Fails if output is piped #24

Closed MagicalTux closed 3 years ago

MagicalTux commented 7 years ago

Trying to pipe output of pdfx causes error

Traceback (most recent call last): File "/usr/bin/pdfx", line 11, in sys.exit(main()) File "/usr/lib64/python2.7/site-packages/pdfx/cli.py", line 189, in main print_to_console(text) File "/usr/lib64/python2.7/site-packages/pdfx/cli.py", line 130, in print_to_console bytes_string = text.encode(sys.stdout.encoding, 'backslashreplace') TypeError: encode() argument 1 must be string, not None

Similar errors can be found in other projects, such as https://github.com/ansible/ansible/commit/c8494cdc39186250e4f814dfc9f86707bc4476c3

metachris commented 3 years ago

Should be fixed now. Please let me know if it still persists with v1.4.1