maxpmaxp / pdfreader

Python API for PDF documents
MIT License
117 stars 27 forks source link

Leaking log statements #93

Closed tvanyo closed 2 years ago

tvanyo commented 2 years ago

I'm using SimplePDFViewer to scrape a PDF in an app that has implemented logging and I've discovered that doing so is generating unexpected log statements to stdout.

I created a slightly larger than minimal mvp to illustrate the problem. The attached zip contains the python code testMVP.py and a test pdf, testPartial.pdf.

To see the problem extract both files to the same directory and run the program twice, once with the --pdf flag and once without:

python testMVP.py will run without executing the code on lines 23 & 24, so SimplePDFViewer is not run. The expected result will be:

Be patient - Extracting text & strings from testPartial.pdf PDF scrapping complete! Generated file test…

Note that I'm using logging instead of print to generate text output to the console.

python testMVP.py --pdf will run executing lines 23 & 24, so SimplePDFView is run. The unexpected result will be:

Be patient - Extracting text & strings from testPartial.pdf PDF scrapping complete! INFO:logTest:PDF scrapping complete! DEBUG:logTest:doing somethings else… Generated file test… INFO:logTest:Generated file test…

You can see that there are 3 logging lines included and the format is clearly different than the stream formatter I setup on line 52 or the file formatter I set up on line 64 of testMVP.py

Version Information:

testMVP.zip

maxpmaxp commented 2 years ago

@tvanyo pdfreader uses the default logger, you may want to do something like

logging.basicConfig(format="%(asctime)s : %(levelname)s : %(message)s", datefmt="%d.%m.%Y %I:%M:%S %p")

before using it.

Feel free to contribute if you notice any logging issues in the project.

tvanyo commented 2 years ago

All the logging works as expected when I don't use SimplePDFViewer, but as soon as I make the call to SimplePDFViewer I'm seeing log messages that for some reason repeat the last message used in a log call. I'm not even seeing the log messages I can see in your code.

I have logging formatters for stdout and a file handler, neither of which matches what is being output only when I call SimplePDFViewer.

As a test I modified the logging of all pdfreader files (17 files had import logging) to connect to the module name logger and the unexpected log messages are not present.

Submitting as a pull request…