tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

About configuration for arabic handwriting traineddata, please? #317

Open Alhar6i opened 1 year ago

Alhar6i commented 1 year ago

Hi,

Is there any configuration required to use the arabic handwriting traineddata, please? I found them here, https://ub-backup.bib.uni-mannheim.de/~stweil/ocrd-train/data/ArabicHandwritingOCRD/tessdata_best/

Should I download all of them or only the latest version?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stweil commented 1 year ago

That models were trained using data from https://doi.org/10.23636/1135. Usually the fast variant should be sufficient. So you could try ArabicHandwritingOCRD_4.837_138060_365920.traineddata for example. Just copy it into the same directory with eng.traineddata and optionally rename it to get a shorter model name.

Note that I cannot test those models myself, so I have no idea how good or bad they perform.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.