jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

How to extract pdf texts which contains text and tables #672

Closed Godlikemandyy closed 2 years ago

Godlikemandyy commented 2 years ago

I want to extract the text of the PDF file which contains texts and tables. I can use page.extract_text() to get texts, but I have a problem what if table in page the order of the extracted texts is inappropriate, the text of table is out of order. I want to get the following effect. Texts and tables are extracted separately, but the order of text is from top to bottom. eg: the current line is text, next line is table. When I extract this line of text, how do I detect that the next line is a table and then extract the text of the table.

Thanks!!!

samkit-jain commented 2 years ago

Hi @Godlikemandyy Appreciate your interest in the library. Could you please provide some more information that will help us assist you better? Pdfplumber version that you are using, the PDF (redacting any sensitive information), the output that you are expecting and the output that you are getting.