Recognize workflow images created by MS visio as text - Githubissues

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

MIT License

6.48k stars 658 forks source link

Recognize workflow images created by MS visio as text #1055

Closed a4073631 closed 9 months ago

a4073631 commented 10 months ago

thank you for providing a good pdf parser.

If a figure is not saved as an image in a pdf, it will be extracted as text. ex) visio

Is there a way to extract pdf of these structures as an image as well?

Attached is the problematic pdf.

Code to reproduce the problem

import pdfplumber pdf = pdfplumber.open('hi_test.pdf') page = pdf.pages[0] print(page.images) output page.images= []

PDF file

Environment

pdfplumber version: 0.10.3
Python version: 3.9.0
OS: window 10