Self training data of Chinese language with jTessBoxEditorFX

FounderBox commented 3 years ago

Environment Tesseract Version: Commit Number: None Platform: X64 Windows 10

Current Behavior: I want to recognition below picture.

But it got 2 error words in chinese, so I use jTessBoxEditorFX to fix it as below.

And generate a new mylang.traineddata file to my tessdata.

If I only use the mylang as language, it works fine, two wrong words has been fixed.

but if I use below mutil-language, chi_sim+mylang, it got error again.

or use below mutil-language, mylang+chi_sim, it even got all wrong.

Expected Behavior: So as you can see the two words be fixed only when I use single mylang as language, If I use mutil-language, it got error again.

Is there a way that set myself training traineddata file as a supplement dataset to the original chi_sim.traineddata? So I can fix all wrong words which can not be recognitioned with chi_sim.traineddata file, thanks a lot! :)

FounderBox commented 3 years ago

Any one can help, a million thanks!

FounderBox commented 3 years ago

@Shreeshrii Could you please give a hand? thanks

tesseract-ocr / tesseract

Self training data of Chinese language with jTessBoxEditorFX #3077