stev-ou / review_ocr

Using Google's Tesseract OCR to extract data from public PDFs
GNU General Public License v3.0
0 stars 1 forks source link

Evaluation Metrics for Scraping Effectiveness #3

Closed samjett247 closed 4 years ago

samjett247 commented 5 years ago

Would be nice to have some built in utilities to look at our scraping effectiveness (percent of the pdf pages successfully parsed).

zachschuermann commented 5 years ago

I hate to say that you're right again, but this is a great idea. I'm thinking we can do this with simple logging utils, and just return after the program finishes. Even if it's as simple as STDOUT, I think having some sort of observable metric would be really valuable. Let me know if you want me to take this @samjett247

samjett247 commented 5 years ago

Addressed in #5