This model is an extended version of the Simple HTR system implemented by @Harald Scheidl and can handle a full line of text image. Huge thanks to @Harald Scheidl for his great works.
Go to the src/
directory and run python main.py
with these following arguments
--train
: train the NN, details see below.--validate
: validate the NN, details see below.--beamsearch
: use vanilla beam search decoding (better, but slower) instead of best path decoding.--wordbeamsearch
: use word beam search decoding (only outputs words contained in a dictionary) instead of best path decoding. This is a custom TF operation and must be compiled from source, more information see corresponding section below. It should not be used when training the NN.I don't include any pretrained model in this branch so you will need to train the model on your data first
I created this model for the Cinnamon AI Marathon 2018
competition, they released a small dataset but it's in Vietnamese, so you guys may want to try some other dataset like [4]IAM for English.
As long as your dataset contain a labels.json
file like this:
{
"img1.jpg": "abc xyz",
...
"imgn.jpg": "def ghi"
}
With eachkey is the path to the images file and each value is the ground truth label for that image, this code will works fine.
Learning is visualized by Tensorboard, I tracked the character error rate, word error rate and sentences accuracy for this model. All logs will be saved in ./logs/
folder. You can start a Tensorboard session to see the logs with this command tensorboard --logdir='./logs/'
It's took me about 48 hours with about 13k images on a single GTX 1060 6GB to get down to 0.16 CER on the private testset of the competition.
The model is a extended version of the Simple HTR system @Harald Scheidl implemented It consists of 7 CNN layers, 2 RNN (Bi-LSTM) layers and the CTC loss and decoding layer and can handle a full line of text image
Highest accuracy achieved is 0.84 on the private testset of the Cinnamon AI Marathon 2018
competition (measure by Charater Error Rate - CER).
If you need a better accuracy, here are some ideas how to improve it [2]:
Btw, don't hesitate to ask me anything via a Github Issue
(See the issue template file for more details)
BTW, big shout out to Sushant Gautam for extended this code for IAM dataset, he even provide pretrained model and web UI for inferences the model. Don't forget to check his repo out.
[1] Build a Handwritten Text Recognition System using TensorFlow
[2] Scheidl - Handwritten Text Recognition in Historical Documents
[3] Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm
[4] Marti - The IAM-database: an English sentence database for offline handwriting recognition