jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
5.99k stars 618 forks source link

[Feature] Add `Column` object(s) to `find_table()` #1050

Open Pk13055 opened 7 months ago

Pk13055 commented 7 months ago

Currently, find_table returns a table object that contains two props, viz. .cells: list[tuple(float, float, float, float]] and .rows: Row object, which in turn contains .cells. It would be great if, similar to .rows, a .columns property is also added, since intersections of bboxes can be calculated and thereby external OCR can be applied on a per-cell basis.

jsvine commented 4 months ago

Thanks for the suggestion, @Pk13055. The idea seems reasonable; I'll investigate how this might be added.