ufosc / DocuMiner

A production-ready pipeline for text mining and subject indexing

MIT License

8 stars 5 forks source link

Optical Character Recognition #17

Open Fennec2000GH opened 2 years ago

Fennec2000GH commented 2 years ago

Description

Perform OCR on images of text to recognize and transform the text into digital format.

Objectives

Familiarize with the functions of a library e.g. pytesseract.
Write a wrapper function that grayscales the image and then utilizes the appropriate OCR function.
Not necessary but may help: add more steps for image preprocessing such as denoising, if that improves OCR accuracy.