tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
625 stars 180 forks source link

Improvements for RTL #159

Closed Shreeshrii closed 3 years ago

stweil commented 4 years ago

@Shreeshrii, I suggest to separate two of the changes in two new pull requests:

  1. pull request for PSM 13
  2. pull request which removes trimming

Those could be merged immediately.

Shreeshrii commented 4 years ago

accuracy-Arabic.traineddata-list.eval.txt accuracy-notranslate.traineddata-list.eval.txt accuracy-rtltest.traineddata-list.eval.txt

I ws not able to add the accuracy reports in comment above.

Shreeshrii commented 4 years ago

@stweil So what is your recommendation regarding norm_mode? 2 for Indic and RTL 1 for others ???

What about punctuation translation for RTL?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.