jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.31k stars 647 forks source link

Not reading the pdf file #803

Closed drnko closed 1 year ago

drnko commented 1 year ago

Whenever I'm converting an image to PDF and trying to extract the text from the converted PDF, the result from PDFplumber is blank.

What I'm doing wrong?

Step 1: Converting an image(jpeg/jpg/png) to PDF using the PIL Saving the converted pdf file.

Step 2: Open converted/saved pdf using pdfplumber.open() Extracting text from the loaded/opened pdf file

===============================================================

Below is the code:

image_1 = Image.open(r'D:\ocr\images\barrel.jpg') im_1 = image_1.convert('RGB') im_1.save(r'test.pdf')

inv_pdf = pdfplumber.open('test.pdf') print('Result:' , inv_pdf.pages[0].extract_text())

=============================================================== Terminal:

PS D:\ocr> & "C:/Program Files/Python310/python.exe" d:/ocr/testing.py Result:

PS D:\GitOCR\ocr>

===============================================================

Below are the files converted PDF files from image file:

test.pdf

test1.pdf

test2.pdf

version: pdfplumber 0.7.6