DISCONTINUATION OF PROJECT
This project will no longer be maintained by Intel.
Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.
Intel no longer accepts patches to this project.
If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.
Contact: webadmin@linux.intel.com
This project aims to create a simple and unified handwritten text-line recognition reference solution using CNN-only and CTC method with PyTorch framework and Intel OpenVINO toolkit. And it is also the source of multiple previous works including:
The overview of our method to predict and decode the result with visual and language model.
Comparison with recent works on the ICDAR 2013 competition set in the metric of character error rate (100% - AR).
Methods | without language model | with language model |
---|---|---|
LSTM-RNN-CTC | 16.50 | 11.60 |
Over-segmentation | - | 3.68 |
CNN-ResLSTM-CTC | 8.45 | 3.28 |
WCNN-PHMM | 8.42 | 3.17 |
CNN-CTC-CBS (this) | 6.38 | 2.49 |
Download the models trained with SCUT-EPT dataset with Open Model Zoo downloader tool.
python <your-openvino-installation>/deployment_tools/open_model_zoo/tools/downloader/downloader.py \
--name handwritten-simplified-chinese-recognition-0001
Run inference with the SCUT-EPT test image as input and check the prediction.
# use character list for SCUT-EPT data instead
# https://github.com/openvinotoolkit/open_model_zoo/blob/master/data/dataset_classes/scut_ept.txt
# NOTE: input data was not normalized during the SCUT-EPT model training
python deploy.py -lang hctr -m <path-to-handwritten-simplified-chinese-recognition-0001.xml> \
-dm greedy-search -i <path-to-input-image>
Input and output examples (NOT use language model, and only use VGG-based network for training this recognition model):
['仍有“铁马冰河入梦来”的情怀,一生心忧天']
['不祇辱于奴隶人之手']
['④自己独自漂泊的孤独之感。人语朱离逢山峒獠”此句可体现。']
['婆婆是受了马骗。⑤阿苦担心家人的身体健康,怕闸蟹吃了会生游。']
['马克思主义者.']
Restricted datasets for research only for different languages:
The format of train/val/test_img_id_gt.txt are designed as one sample per line, which uses a comma to separate the image and label:
img_id_1,text_1
img_id_2,text_2
img_id_3,text_3
...
The format of the chars_list.txt is designed as one character per line:
character_1
character_2
character_3
...
Note that, before training with a target dataset, all the gray-scale images should be resized to fixed-height (e.g. 128) with fixed-ratio.
python main.py -m hctr -d <path-to-dataset> -b 8 -pf 100 -lr 1e-4 --gpu 0 (default: multi-gpus mode)
After the training, the model can then be tested with testset or single image for benchmarking. In order to get the output of CTC-based text recognition, there is one additional step called decoding which is ensential for final accuracy.
python test.py -m hctr -f <path-to-trained_model> \
-i <input_image> (or -ts <path-to-testset>) \
-dm [greedy-search|beam-search] \
...
This is a basic and default decoding method, and it only takes the maximum probalitiy at each output step.
To further improve the accuracy, a specific language model (n-gram or transformer-based) can be introduced to work with the beam-search decoding. There are two major strategies provided to configurate the beam-search.
-dm beam-search --use-tfm-pred --transformer-path <path-to-trained-tranformer>
-dm beam-search --skip-search --kenlm-path <path-to-trained-ngrams>
See the instructions under third-party folder to train a specific n-gram or transformer-based language model.
Handwritten Chinese OCR Samples is licensed under Apache License Version 2.0.
If you find this code useful in your research, please consider citing:
@article{bliu2020hctr-cnn,
Author = {Brian Liu, Xianchao Xu, Yu Zhang},
Title = {Offline Handwritten Chinese Text Recognition with Convolutional Neural Networks},
publisher = {arXiv},
Year = {2020}
}