nainiayoub / pdf-text-data-extractor

PDF text data extraction web app with OCR for scanned documents
https://share.streamlit.io/nainiayoub/pdf-text-data-extractor/main/app.py
80 stars 48 forks source link
ocr ocr-python ocr-text-reader pdf pdf-to-text python streamlit streamlit-webapp text-extraction

PDF to Text

Open in Streamlit visitor badge forks badge starts badge

PDF text data extraction app that takes a PDF document as input and returns either a txt file that contains all pages or a compressed folder of txt files representing the document pages. OCR can also be enabled for scanned docoments.

pdf_text_image

How does it worK?

flowchart LR

A[PDF] --> |text conversion / OCR| B(Text)
B --> |Option 1| D[txt file]
B --> |Option 2| E[ZIP folder of txt files for pages]
  1. Upload your PDF.
  2. Enable OCR (for scanned documents).
  3. Select the PDF language.
  4. Download your output file (zip/txt).

How to support the project

You can help support the project through feedback and/or buy me coffee.