jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.1k stars 625 forks source link

extract table cross two pages #922

Closed tujinshu closed 1 year ago

tujinshu commented 1 year ago

It is very common for tables to span two pages in a PDF document, but pdfplumber can only handle tables that are on a single page. Is it possible for pdfplumber to support tables that span multiple pages?

for example: this table cross the page 165 and the page 166,can it be exacted ?

企业微信截图_91f5411e-7cbf-4b9e-aa51-410e88765cb3

wps.pdf

jsvine commented 1 year ago

Hi @tujinshu, and thanks for your interest in pdfplumber. Because of the wide variety of ways that PDFs can represent tables, and the wide variety of ways that they can be split across pages, I currently do not think I'll be able to add such a feature to the library.

roki1031 commented 11 months ago

Hi @tujinshu, and thanks for your interest in pdfplumber. Because of the wide variety of ways that PDFs can represent tables, and the wide variety of ways that they can be split across pages, I currently do not think I'll be able to add such a feature to the library.您好,感谢您对 pdfplumber 的关注。由于 PDF 表示表格的方式多种多样,并且跨页面分割的方式也多种多样,因此我目前认为无法将这样的功能添加到库中。

This feature is very much needed, please.