Closed IDerr closed 6 years ago
Hi,
The situation of Chinese is special, because the language has a bigger "alphabet" with about 2 "letters" per word. Languages that use latin characters have a smaller alphabet, but longer words and even spaces. So latin-based languages put more strain on the quality of the language model. This is why, RNN/LSTM/GRU/CTC etc. are quite popular.
Thus, I don't think just changing the training data will suffice. Fortunately, there is a lot of research on latin-based OCR. You can look here for some example code: https://github.com/keras-team/keras/blob/master/examples/image_ocr.py
Thanks a lot :)
Hi, Congratulations for this project. I wanted to ask if this is a good way to ocr images with latin fonts or is this more specific for chinese fonts.
Thanks for your work