metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.05k stars 115 forks source link

Copious INFO logging #41

Closed tripleee closed 3 years ago

tripleee commented 4 years ago

Running this from a Python script with logging.basicConfig(level=logging.INFO) creates incredible amounts of detailed logging which would probably best be confined to level=logging.DEBUG and/or ideally possible to turn off if you don't cate about the library's internals.

czechnology commented 4 years ago

While I agree with this issue, I think it's caused by the underlying pdfminer2 package, which seems to make strong use of logging.info calls, using the root handler, thus not allowing to disable it easily.

On the same topic, that library seems to be unmaintained by Chris (last commit Dec 2015), maybe it would make sense to switch to the pdfminer.six library (from which pdfminer2 is originally forked), unless there are some important changes not available in the original lib?

metachris commented 3 years ago

switched to pdfminer.six now