Open zaryabRiasat opened 5 months ago
Are you using very old instructions (old Tesseract release, old repository URL, ...)?
@stweil Thank You for your response.
Yes I'm using tesseract-4.1.1, Old Repository.
First time training is working fine with START_MODEL=eng
, but I am unable to do incremental training as mentioned in above details.
@stweil I just want to know, how can I do incremental-training on my existing trained model?
What steps I should follow?
What about reading Tesseract documentation and Readme of this repository?
@zaryabRiasat, the first step is using a recent software release instead of an old one and also reading the current documentation.
I'm working with
tesseract-4.1.1
and trying to do training(fine-tuning)
for this I have followed steps:Downloaded
eng.traineddata
fromtessdata_best
and pasted it into/usr/share/tesseract-ocr/4.00/tessdata
.Then I've created image-crops using
craft-text-detector
in python and made ground-truths(.gt.txt)
for each image crop.Then cloned git clone
https://github.com/tesseract-ocr/ocrd-train.git
and then cdocrd-train
.Inside
ocrd-train/data
folder, I've createdmy-model-ground-truth
folder and pasted.png
and.gt.txt
files in it.Then I ran command
make tesseract-langdata
on terminal.At last I ran command
make training MODEL_NAME=my-model MAX_ITERATIONS=20000 PSM=7 FINETUNE_TYPE=Impact DEBUG_INTERVAL=-1 START_MODEL=eng TESSDATA=/usr/share/tesseract-ocr/4.00/tessdata/
Above procedure took some time, and I got
my-model.traineddata
file inocrd-train/data/
. I've pasted that file in/usr/share/tesseract-ocr/4.00/tessdata
and it is giving results better thaneng.traineddata
.For above training I used 20 images, now I want to do incremental-training. I want to train 30 more images on previously trained
my-model.traineddata
. Here I'm confused because after completion of previous training there are some folder inocrd-train/data/
:my-model (folder)
my-model-ground-truth (folder)
eng (folder)
langdata (folder)
my-model.traineddata (file)
Now what should I do for incremental-training?
Do I only need to remove files in my-model-ground-truth and paste new
.png
and.gt.txt
files of 30 images, and usemy-model
asSTART_MODEL
?Or I need to remove other folders as well?