jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
5.99k stars 618 forks source link

Any way to detect formatting? #1108

Closed enrac5 closed 3 months ago

enrac5 commented 3 months ago

Discussed in https://github.com/jsvine/pdfplumber/discussions/1106

Originally posted by **enrac5** March 8, 2024 Hi there, I am parsing a PDF with tables and I'd like to be able to detect formatting like italics and bold in the text. Any ideas on if that's possible (or any hacks anyone has) and how to do it? Edit: I have this code snippet that works for characters: ` import pdfplumber pdf_path = "/tmp/Foo_1.pdf" pdf = pdfplumber.open(pdf_path) page = pdf.pages[0] line_list = [] for char in page.chars: print(char["fontname"]) ` Which is great, but how do I do this for a given table?
jsvine commented 3 months ago

Let's keep this in one thread; closing in favor of #1106