Open k3ntar0 opened 1 year ago
The model data this library loads is the same as the C++ Tesseract, so this means that you can load files from https://github.com/tesseract-ocr/tessdata_best for your language.
How can we make it compatible with characters used in non-Latin scripts, for example, Japanese characters?
The code in this project is in theory script-independent, in the sense that it is mostly concerned with getting data into Tesseract as pixels and out as bounding boxes and Unicode text. If you load the right model, non-Latin languages may already work. However, I have not done any testing of this myself and there may be some extra work required. This is an area where I could use some help from interested users of the library.
This project is wonderful! How can we make it compatible with characters used in non-Latin scripts, for example, Japanese characters? Are tessdata available?