jlevy / the-art-of-command-line

Master the command line, in one page
150.75k stars 14.4k forks source link

Convet PDF to TEXT #223

Open miltonlab opened 8 years ago

miltonlab commented 8 years ago

Some command to extrat tabular data from PDF to spreadshet or CSV or txt? pdftotext is not exact

ShawnMilo commented 8 years ago

This is not an easy problem. textract is a good project, but when it comes to PDFs, which can literally be images, OCR is sometimes required. Also, formatting is a major issue and no automated system will be perfect.