tesseract-ocr / langdata

Source training data for Tesseract for lots of languages
Apache License 2.0
827 stars 886 forks source link

Fixes extra intra-word-spaces problem with 4.0 #108

Closed Shreeshrii closed 6 years ago

Shreeshrii commented 6 years ago

Fixes https://github.com/tesseract-ocr/tesseract/issues/988

preserve_interword_spaces 1

Shreeshrii commented 5 years ago

Yes, langdata and langdata_lstm also need changes. Those will effect any new training being done.

The existing traineddata files have to be updated with the new config files.

The various _vert.traineddata may also need update, I haven't looked into it.

PS. Writing from my phone, don't have access to the repo to check details right now.