symbolic languages like Chinese, Korean and Japanese needs to to be update

tesseract-ocr / tessdata

Trained models with fast variant of the "best" LSTM models + legacy models

Apache License 2.0

6.46k stars 2.2k forks source link

symbolic languages like Chinese, Korean and Japanese needs to to be update #155

Open vsatyamesc opened 2 years ago

vsatyamesc commented 2 years ago

symbolic languages like Chinese, Korean and Japanese needs to to be update because the old fonts are not used much anymore and there's some new character too

Robban1980 commented 2 years ago

I'm interested in the Japanese and Chinese models. I have only done small scale training in the past (years ago) for Japanese to use locally and testing. Are there any good resources on how I can improve the models with additional fonts? And for example, what to look at during the training?