extract_table() to extract table content, and find that the order of extracted text in individual cells is inconsistent with the original text.
pdf table:
Code to reproduce the problem
table_text_items: List[tuple] = []
with pdfplumber.open(file_path) as pdf:
for page in pdf.pages:
table = page.extract_table()
lines: List[str] = []
if table:
for row in table:
for line in [item for item in row if item is not None]:
if line:
lines.extend(line.split("\n"))
if lines:
table_text_items.append((page.page_number, lines))
PDF file
Please attach any PDFs necessary to reproduce the problem.
If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.
Describe the bug
extract_table() to extract table content, and find that the order of extracted text in individual cells is inconsistent with the original text.
pdf table:![image](https://github.com/jsvine/pdfplumber/assets/42051898/67b30a7b-92c2-483c-9389-3d1c5ab2669f)
Code to reproduce the problem
PDF file
Please attach any PDFs necessary to reproduce the problem.
If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.