tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
640 stars 190 forks source link

How to train Chinese tradtional vertical in Tesseract 5? #389

Open fishfree opened 6 months ago

fishfree commented 6 months ago

This is the screenshot from jTessBoxEditor: image The provided example training files in this repo seems building a whole line of image & text pairs, other than character-by-character. Then my questions are:

  1. How to efficiently split a single image into mulitple vertical lines of text?
  2. By which directon do we write the vertical text in text files, L2R or R2L?
fishfree commented 6 months ago

For Q1, I noticed the answer here.