tesseract-ocr / langdata

Source training data for Tesseract for lots of languages
Apache License 2.0
834 stars 888 forks source link

Add 3.05 branch #79

Closed Shreeshrii closed 6 years ago

Shreeshrii commented 7 years ago

@zdenop Please add a 3.05 branch, so that langdata changes for 3.0x and 4.0x can be updated separately.

zdenop commented 7 years ago

What kind of "3.05 updates" are planned for langdata?

stweil commented 7 years ago

See PR #77 and the related discussion. It is needed for 3.05, too.

Shreeshrii commented 7 years ago

eg. Sanskrit Wordlist includes punctuation. It also has some characters that though part of Devanagari script, are not used in Sanskrit language.

zdenop commented 7 years ago

77 was merged and AFAIK there no more problems like this. But nobody will regenerate binary part of trainneddata files.

stweil commented 7 years ago

77 was merged in the master branch, but the same change is needed for 3.05, too.

I also fixed the binary part for the master branch (see https://github.com/tesseract-ocr/tessdata/pull/57), but cannot fix the binary part for 3.05 because there is no branch for that version.

zdenop commented 7 years ago

IMO updates/fixes will be in master. Do we really need to create new branch of langdata just for 3.05?