ufosc / DocuMiner

A production-ready pipeline for text mining and subject indexing
MIT License
8 stars 5 forks source link

Optical Character Recognition #17

Open Fennec2000GH opened 2 years ago

Fennec2000GH commented 2 years ago

Description

Perform OCR on images of text to recognize and transform the text into digital format.

Objectives

  1. Familiarize with the functions of a library e.g. pytesseract.
  2. Write a wrapper function that grayscales the image and then utilizes the appropriate OCR function.
  3. Not necessary but may help: add more steps for image preprocessing such as denoising, if that improves OCR accuracy.