tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
62.34k stars 9.52k forks source link

Inherited.unicharset #3436

Open typeoo opened 3 years ago

typeoo commented 3 years ago

Environment

Current Behavior:

I can't fine tune Persian Language failed to load script unicharset from:../langdata_lstm/Inherited.unicharset

I couldn't find this file Inherited.unicharset what should I do?

unnamed

When I run lstmtraining I get this error :

unnamed (1)

The best fas.traineddata can't recognize some characters like "، َ ُ ِ " So I decided to find some of the characters and fonts that are used a lot in the Persian language and the model is bad at detecting them.

Thanks.

typeoo commented 3 years ago

@Shreeshrii

icecrypt7 commented 2 years ago

Arabic.unicharset can be used as Inherited.unicharset I suggest you training from scratch with this net spec: [1,48,0,1Ct3,3,16Mp3,3Lfys48Lfx96Lrx96Lfx256O1c1] More Tips on https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html