I adapted the four-stage STR framework devised by deep-text-recognition-benchmark, and replaced the Pred.
stage with Transformer.
Equipped with Transformer, this method outperforms the best model of the aforementioned deep-text-recognition-benchmark by 7.6% on CUTE80.
This pre-trained weights trained on Synthetic dataset for about 700K iters.
Git clone this repo and download the weights file, move it to checkpoints
directory.
data_lmdb_release.zip contains below.
training datasets : MJSynth (MJ)[1] and SynthText (ST)[2] \
validation datasets : the union of the training sets IC13[3], IC15[4], IIIT[5], and SVT[6].\
evaluation datasets : benchmark evaluation datasets, consist of IIIT[5], SVT[6], IC03[7], IC13[3], IC15[4], SVTP[8], and CUTE[9].
Please configure your data_dir
in config.py
file, then run:
python tools/train.py
The Transformer-base STR achieves 0.815972 accuracy on CUTE80, outperforming the best model of deep-text-recognition-benchmark, which is 0.74
If you want to reproduce the evaluation result, please run:
python evaluation.py
Make sure your cute80_dir
and saved_model
path is correct. you'll get the result 0.815972
Feel free to contact me (gao.gzhou@gmail.com).
This project is released under the Apache 2.0 license.