Closed hoangtocdo90 closed 7 years ago
@theraysmith Are Halfwidth katakana included in your new Japanese training?
@Shreeshrii I think halfwidth katakana not include @theraysmith sir ? can you tell me how to get glyph_metrics in unicharset?
@hoangtocdo90 Please see https://github.com/tesseract-ocr/langdata/issues/81#issuecomment-320821042 and reply to Ray's questions there.
Hi guys ! I'm try training tesseract in Japanese. In Japanese has some type of char. In my case it's about Halfwidth and fullwidth in Katakana table. Half-width Katakana Example : アイウエオ カキクケコ Full-width Katakana Example : アイウエオ カキクケコ It's really look like similar or look like uppercase and lowercase but diffirence When input a Halfwidth katakana, Tesseract can't recognize or some times out with Full-width katakana.
I try to using text2img make image and box, doing ltsm.train. But have some problem with unicharset!
set_unicharset_properties -U unicharset -O unicharset -X jpn.xheights --script_dir=./langdata
I have checked in langdata/Katakana.unicharset. Don't have any half-width katakana symbol. Because of this i can't make a unicharset file with all the fields set to the right values, like in this exampleThis is my unicharset file i got from run command
unicharset_extractor jpn.msgothic.exp18.box jpn.msgothic.exp32.box jpn.msgothic.exp48.box jpn.msgothicb.exp18.box
Thanks!