jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.02k stars 619 forks source link

"mproving PDF-to-Text Conversion: Integrating Tables as Markup Text on a Page-by-Page Basis #984

Open Isha09Garg opened 10 months ago

Isha09Garg commented 10 months ago

Is it possible to integrate text seamlessly with tables, essentially converting tables into markup text, to enhance the quality of PDF-to-text conversion on a per-page basis?

jsvine commented 10 months ago

Hi @Isha09Garg, and thanks for your interest in this library. It's a really interesting functionality you propose, but my instinct is that this is best handled by a third-party library, given the extensive amount of customization I can imagine users wanting. (E.g., How to represent more complex text layouts in Markdown, or how to determine whether a line of text should be rendered as a Markdown heading or not, etc.) If you or another member of the community wants to build that, I'd be happy to link to the project from pdfplumber's documentation.