jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.1k stars 625 forks source link

Possibility of text extraction using coordinates #830

Closed sandeepreddy5 closed 1 year ago

sandeepreddy5 commented 1 year ago

Describe the bug

How would i extract text using coordinates or along with coordinates?

Code to reproduce the problem

with pdfplumber.open("example.pdf" example.pdf ) as pdf: for page in pdf.pages: print(page.rects)

PDF file

example.pdf

Expected behavior

{'x0': 0.0, 'y0': 0.0, 'x1': 612.0, 'y1': 792.0, 'width': 612.0, 'height': 792.0, 'pts': [(0.0, 0.0), (612.0, 0.0), (612.0, 792.0), (0.0, 792.0)], 'linewidth': 0, 'stroke': False, 'fill': True, 'evenodd': False, 'stroking_color': (1, 1, 1), 'non_stroking_color': (1, 1, 1), 'object_type': 'rect', 'page_number': 1, 'top': 0.0, 'bottom': 792.0, 'doctop': 0.0, '<text': 'respective text'}

Actual behavior

{'x0': 0.0, 'y0': 0.0, 'x1': 612.0, 'y1': 792.0, 'width': 612.0, 'height': 792.0, 'pts': [(0.0, 0.0), (612.0, 0.0), (612.0, 792.0), (0.0, 792.0)], 'linewidth': 0, 'stroke': False, 'fill': True, 'evenodd': False, 'stroking_color': (1, 1, 1), 'non_stroking_color': (1, 1, 1), 'object_type': 'rect', 'page_number': 1, 'top': 0.0, 'bottom': 792.0, 'doctop': 0.0}

Environment