naderabdalghani/doclense

An app for extracting text from camera-captured photos into .docx files

About the Project
- Built With
Getting Started
Usage
Results
Roadmap
Contributors
Acknowledgements

About The Project

App Showcase

Built With

Flask
PyTorch
This app uses a slightly modified version of this Kaggle kernel as its letters classifier model. This model uses a pre-trained Wide ResNet-50-2 convolutional deep neural network which achieves 99% accuracy after training it on this dataset.

Getting Started

Prerequisites

Setup Python using this link

Installation

Create a virtual environment cd <project-directory>
- On Unix-based OS's: $ python3 -m venv venv
- On Windows: > py -3 -m venv venv
Activate the environment
- On Unix-based OS's: $ . venv/bin/activate
- On Windows: > venv\Scripts\activate
Install app dependencies pip install -r requirements.txt
Create the following directories in the project main directory
- <project-directory>\results
- <project-directory>\uploads
Download the trained model state dictionary and place it in <project-directory>\model\
[Optional] Download the dataset used and extract it in <project-directory>\model\

Running

Make sure you are in the project directory cd <project-directory>
- On Unix-based OS's: $ python api.py
- On Windows: > py -3 api.py

Usage

Simply click on the 'Upload' button and select a photo that contains printed text. Click on the 'Submit' button and wait briefly for your .docx file to start downloading.

Results

Test Case 0

test_0

Test Case 1

test_1

Roadmap

List of Proposed Improvements

Fix and integrate the de-skewing script
Make the app more tolerant to closely-spaced words
Train the model on letters with different fonts
Copy the indentation and format of the printed sheet
Error handling of missing directories

Contributors

Nader AbdAlGhani
- Text separation implementation
- Dataset creation
- Classifier model implementation
- Utility functions implementation
- Web app development and integration
Mohamad Ahmad
- Lots of research
- De-skewing algorithm implementation
Mostafa Walid
- Thresholding algorithms implementation
Omar Salah
- Segmenting input pages into lines, lines into words and words into letters

naderabdalghani / doclense

readme

Table of Contents