Russian language - Githubissues

juhnowski commented 6 years ago

The Russian dictionary is very low quality. In the recognized text is inserted obscene language. Look attentively at the committers of the dictionary and consider whether it is worth continuing cooperation with them.

amitdo commented 6 years ago

Look attentively at the committers of the dictionary and consider whether it is worth continuing cooperation with them.

We'll fire the bots... :laughing:

juhnowski commented 6 years ago

Do you need help in this shooting?

amitdo commented 6 years ago

From https://github.com/tesseract-ocr/tessdata/issues/62#issuecomment-319839971

theraysmith commented on Aug 3, 2017

FYI: The wordlists are generated files, so it isn't a good idea to modify them, as the modifications will likely get overwritten in a future training. To help prevent the ß/B confusion, the words that you want to lose from the wordlists need to go in langdata/lang/lang.bad_words.

tesseract-ocr / tessdata

Russian language #100