metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.03k stars 113 forks source link

Detect if PDF is a scan #8

Closed metachris closed 3 years ago

metachris commented 8 years ago

and recommend OCR?

khoanguyen8496 commented 7 years ago

Have you tried tesseract OCR?