tesseract-ocr / langdata_lstm

Data used for LSTM model training
Apache License 2.0
114 stars 152 forks source link

Support for New Reiwa Era Character ㋿ in Japanese #32

Open prateek4sep opened 4 years ago

prateek4sep commented 4 years ago

With the new Japanese Reiwa Era, there's a new character introduced ㋿ (U+32FF). Support for this character is required.

Current Behavior: Other Characters are being identified 砒後徘朔御菓 Expected Behavior: ㋿ should be identified for the given input image Suggested Fix: Train and Update the current jpn.traineddata file with the new jpn character.

Reference: Wiki Page

Attached: The input file I used. The character in 6 different fonts for training. Reiwa.docx

Reiwa
stweil commented 4 years ago

We need an update for langdata_lstm. Do you want to send a pull request there?

I transfer the issue to langdata_lstm.