INVOICE PDF - Githubissues

I'm new to using AI, and I'm looking for guidance on how to extract invoice details from PDF files, similar to how it's done for images. Can you provide some suggestions or steps to achieve this? Thanks in advance.

The PyPDF2 library is one of the ways you can get text from a PDF without using OCR, as it enables you to read and extract text from each page of non-image based PDF. Where one cannot directly extract texts in case of an image-based PDF, OCR (Optical Character Recognition) may be employed through pytesseract, alongside pdf2image that converts pdf pages to images so as to extract texts out of them instead. So, this method covers both scanned and textual PDFs.

karndeepsingh / ApplicationsBuildWithLLMs

INVOICE PDF #1