Open ghost opened 7 years ago
Try to find out where that 0.5 comes from.
Maybe the errors are mostly with dot, comma, and spaces.
I would suggest to use ocropus-econf *.gt.txt
to see the most common confusions, see https://github.com/tmbdev/ocropy/wiki/Compute-errors-and-confusions.
The error is almost certainly caused by incorrect ordering of the training data (error will usually hover around 0.6). The code points have to be in display order (i.e. left-to-right) instead of reading order (right-to-left). If you've created them using kraken/ketos run linegen with the --reorder
option to fix this. It doesn't default to this option as the training interface is intended to deal with that for you once it is finished.
Thanks for your reply, I am taking all your suggestions, and will work on tracing the error. keep the issue open
I have been training an Arabic language model from scratch for days now, reaching +800,000 epochs, the error rate wont go below 0.5 and thats very bad. I have used artificial training data that I have created, here are there specifications: Arabic, no diacritics, 300dpi, black and white, 100% correct transcriptions, about 2100 lines. The CLSTM settings consists of hidden=100 and lrate=1e-4
Can anybody help @tmbdev @mittagessen