tesseract-ocr / langdata

Source training data for Tesseract for lots of languages
Apache License 2.0
834 stars 888 forks source link

Missing Norwegian special characters in desired_characters file #91

Open Andrioden opened 7 years ago

Andrioden commented 7 years ago

The file langdata/nor/desired_characters does not contain "Ø" and "Å" which is 2 of the 3 special characters in the Norwegian language. It seems intuitively that these should be added as well like "Æ" was because of #36.

I also want to point out the fact that "Ä", "É", "Ö" is added to the desired_characters file when these characters has nothing to do with the Norwegian alphabet and is not used in Norwegian (unless quoting Swedish papers or such).

Disclaimer: I have not tested anything related to this, only stumbled upon it and wanted to notify you.

momentumvi commented 5 years ago

This is still an issue.