tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

how to prepare the data for new tessdata images in khmer lang #312

Closed mengleang-ngoun closed 1 year ago

mengleang-ngoun commented 1 year ago

how to prepare the data for new tessdata images in khmer

kba commented 1 year ago

Khmer or Arabic?

In principle, it's the same process for every language, you need pairs of lines images and their transcription. AFAIK Khmer is a left-to-right script with distinct characters, so you basically only need ground truth data and you can train both from scratch or fine-tune the existing Khmer model.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.