Can't encode transcription

tesseract-ocr / tesstrain

Train Tesseract LSTM with make

Apache License 2.0

599 stars 178 forks source link

Can't encode transcription #340

Open zhoub opened 1 year ago

zhoub commented 1 year ago

Hi,

I'm using last main branch and trying to train Japanese, prepared some text from news but it generated the following error

If remove all Japanese Zenkaku characters, which means the symbol has same size as standard character, such as （７１） and ３０ , it would be able to train.

Is there anything special to train Japanese ?

Thanks a lot !

zdenop commented 1 year ago

Please provide a test case (all files and the exact steps you made) to reproduce the problem.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.