Closed soufieneghribi closed 2 years ago
Two very different things in one issue.
Failed to load any lstm-specific dictionaries for lang ea
This expected since dictionaries are not carried over to the finetuned model (on purpose). Using dictionaries is not recommended in general for LSTM-based OCR.
The new model is supposed to perform better than the base model
Not necessarily. This very much depends on your training and especially on the images you feed into it. Your example has a very bad line segmentation. Are you using PSM=7 or PSM=13?
This expected since dictionaries are not carried over to the finetuned model (on purpose). Using dictionaries is not recommended in general for LSTM-based OCR.
Thank you for your response. I'm not using dictionaries, IS this acceptable as a warning? Could it affect my trained model?
Not necessarily. This very much depends on your training and especially on the images you feed into it. Your example has a very bad line segmentation. Are you using PSM=7 or PSM=13?
I am using PSM=13 for the training and prediciton.
Thank you for your response. I'm not using dictionaries, IS this acceptable as a warning? Could it affect my trained model?
Yes. Definitely. No worries.
I followed the documentation . I prepared 350 one line Arabic images (xx.png) and their transcript (xx.gt.txt) and starting training with START_MODEL=ara.
make training MODEL_NAME=elda START_MODEL=ara TESSDATA=data/ara_best
I am getting 100% as error rate.
I there something I am missing?
What is ara_best
? The TESSDATA
parameter should direct to your Tesseract model directory (e.g. /usr/local/share/tessdata
in most cases).
Yes it was ara best. I put it under repo data/ara
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I am trying to finetune a model with a hundreds of images. I am using the ara model as the START_MODEL running this command:
make training MODEL_NAME=ea START_MODEL=ara TESSDATA=data/ara
The new model is supposed to perform better than the base model (ara). But I am getting bad results Example:
With ara best model I get: بن عثمان بِن الهادي _
with the finetuned model I get: ﻦﻴﻣﺟﺟﺟﺟ_ﺎﻟﺒﻟﻤﻟﻟﺍﻣ