jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.31k stars 647 forks source link

are the default config for extracting text/tables the best ones? #801

Closed sergenti closed 1 year ago

sergenti commented 1 year ago

if I had to use only one table_settings, what should I use? I'm working on a SaaS project and can't manually change values depending on the context

I'm dealing with the following types of documents:

Some ideas?

right now for these types of document seems that the basepdfminer.six works best for text extraction, and that another library called tabula-py works best for extracting tables.

maybe I am missing something, this library seems so well-written.