Inherited.unicharset - Githubissues

typeoo commented 3 years ago

Environment

Tesseract Version: 4.1.1
Platform: Linux

Current Behavior:

I can't fine tune Persian Language failed to load script unicharset from:../langdata_lstm/Inherited.unicharset

I couldn't find this file Inherited.unicharset what should I do?

unnamed

When I run lstmtraining I get this error :

unnamed (1)

The best fas.traineddata can't recognize some characters like "، َ ُ ِ " So I decided to find some of the characters and fonts that are used a lot in the Persian language and the model is bad at detecting them.

Thanks.

typeoo commented 3 years ago

@Shreeshrii

icecrypt7 commented 2 years ago

Arabic.unicharset can be used as Inherited.unicharset I suggest you training from scratch with this net spec: [1,48,0,1Ct3,3,16Mp3,3Lfys48Lfx96Lrx96Lfx256O1c1] More Tips on https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html

tesseract-ocr / tesseract

Inherited.unicharset #3436

Environment

Current Behavior: