Open juhnowski opened 6 years ago
Look attentively at the committers of the dictionary and consider whether it is worth continuing cooperation with them.
We'll fire the bots... :laughing:
Do you need help in this shooting?
From https://github.com/tesseract-ocr/tessdata/issues/62#issuecomment-319839971
theraysmith commented on Aug 3, 2017
FYI: The wordlists are generated files, so it isn't a good idea to modify them, as the modifications will likely get overwritten in a future training. To help prevent the ß/B confusion, the words that you want to lose from the wordlists need to go in langdata/lang/lang.bad_words.
See also page 8 in https://github.com/tesseract-ocr/docs/raw/master/das_tutorial2016/6ModernizationEfforts.pdf.
The Russian dictionary is very low quality. In the recognized text is inserted obscene language. Look attentively at the committers of the dictionary and consider whether it is worth continuing cooperation with them.