Support for Non-Latin Characters

The model data this library loads is the same as the C++ Tesseract, so this means that you can load files from https://github.com/tesseract-ocr/tessdata_best for your language.

How can we make it compatible with characters used in non-Latin scripts, for example, Japanese characters?

The code in this project is in theory script-independent, in the sense that it is mostly concerned with getting data into Tesseract as pixels and out as bounding boxes and Unicode text. If you load the right model, non-Latin languages may already work. However, I have not done any testing of this myself and there may be some extra work required. This is an area where I could use some help from interested users of the library.

robertknight / tesseract-wasm

Support for Non-Latin Characters #90