jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.1k stars 625 forks source link

Incorrect extraction in tables #921

Closed tujinshu closed 1 year ago

tujinshu commented 1 year ago

Describe the bug

A clear and concise description of what the bug is. table extract error in pdf,one colum extract to several colum

Code to reproduce the problem

Paste it here, or attach a Python file.

import pdfplumber
pdf = pdfplumber.open("./wps.pdf")
p0 = pdf.pages[168]
im = p0.to_image()
im.debug_tablefinder()

PDF file

wps.pdf

Expected behavior

page 169 origin content: image

Actual behavior

image

Screenshots

image

Environment

Additional context

Add any other context/notes about the problem here.