How to extract pdf texts which contains text and tables

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

MIT License

6.57k stars 659 forks source link

I want to extract the text of the PDF file which contains texts and tables. I can use page.extract_text() to get texts, but I have a problem what if table in page the order of the extracted texts is inappropriate, the text of table is out of order. I want to get the following effect. Texts and tables are extracted separately, but the order of text is from top to bottom. eg: the current line is text, next line is table. When I extract this line of text, how do I detect that the next line is a table and then extract the text of the table.

Thanks!!!

jsvine / pdfplumber

How to extract pdf texts which contains text and tables #672