tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
620 stars 181 forks source link

lstm training core dumped for INR symbol #232

Closed binarymachine-91 closed 3 years ago

binarymachine-91 commented 3 years ago

@Shreeshrii - Attempt to create training file with INR symbol ₹ - core dumped

Changed the eng.training_text $ to ₹. Then using gentrain.sh shell created the lstm files for arial font. You can see the missing properties for inr ₹ symbol. - error1.png Then using pbtune tried to fine tune and failed. - error2.png But when I give iterations 1 instead of 400 then it goes through. - pbtunewithsingle iteration.png Attached the gentrain and pbtune shell dvripts and also the eng.taining_text file.

error1 error2

pbtunewithsingle iteration

atttachment.zip

Shreeshrii commented 3 years ago

Training with tesstrain.sh is not supported here. Use the python scripts provided in this repo.

Apply the PR https://github.com/tesseract-ocr/tesstrain/pull/230 which has a script to run training for engINR also. See the attached log files from the finetuning run.

engINR.log1.txt

engINR.log.txt

engINR_0.110_287_1900.traineddata.zip

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.