tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
620 stars 181 forks source link

Improved Makefile and python scripts for Validation CER plotting #218

Closed Shreeshrii closed 3 years ago

Shreeshrii commented 3 years ago

Improvements over current version:

See sample plots below:

Training from Scratch - Sanskrit in Devanagari + IAST + English

San-validate-plot_cer

Training with START_MODEL=san, PlusMinus - Sanskrit in Devanagari

sanPlusMinus-validate-plot_cer

Shreeshrii commented 3 years ago

PR #214, #215 and #217 are needed to be applied for this to work.

Shreeshrii commented 3 years ago

see https://github.com/Shreeshrii/tesstrain/tree/ben/data/ben for training results and reports for Bengali using groundtruth provided in https://groups.google.com/g/tesseract-ocr/c/kpF4jmik6W0/m/bo4HL0ejBgAJ

I have changed the plots to also display the best fit curve for the CER values. New plots look like the following:

ben test

sanPlusMinus

I will try and convert the bash scripts creating ocreval reports to make.