Open drnko opened 1 year ago
It’s probably worth reporting this on pdfumbers GitHub, this is pdfminer.six
Oh! Sorry
pdfplumber doesn’t do OCR (optical character recognition) - what you have done is just create a PDF with an image and no text. If you are starting with an image and want the “text” on the image you should look at Tesseract or services like AWS Textract. Good luck!
Bug report
Whenever I'm converting an image to PDF and trying to extract the text from the converted PDF, the result from PDFplumber is blank.
What I'm doing wrong?
Step 1:
Step 2:
===============================================================
Below is the code:
image_1 = Image.open(r'D:\ocr\images\barrel.jpg') im_1 = image_1.convert('RGB') im_1.save(r'test.pdf')
inv_pdf = pdfplumber.open('test.pdf') print('Result:' , inv_pdf.pages[0].extract_text())
=============================================================== Terminal:
PS D:\ocr> & "C:/Program Files/Python310/python.exe" d:/ocr/testing.py Result:
PS D:\GitOCR\ocr>
===============================================================
Below are the files converted PDF files from image file:
test.pdf
test1.pdf
test2.pdf