tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
626 stars 180 forks source link

Training data from scratch gives error rate alomst 100% and very less ocr accuracy #51

Closed jayawantkarale closed 5 years ago

jayawantkarale commented 5 years ago

we are training data from scratch using ocrd- train for devnagari script. We train from following samples images which uses 10 line samples to trained data.

marathi1

training log after make training python generate_line_box.py -i "data/ground-truth/marathi1-001.exp0.tif" -t "data/ground-truth/marathi1-001.exp0.gt.txt" > "data/ground-truth/marathi1-001.exp0.box" python generate_line_box.py -i "data/ground-truth/marathi1-002.exp0.tif" -t "data/ground-truth/marathi1-002.exp0.gt.txt" > "data/ground-truth/marathi1-002.exp0.box" python generate_line_box.py -i "data/ground-truth/marathi1-003.exp0.tif" -t "data/ground-truth/marathi1-003.exp0.gt.txt" > "data/ground-truth/marathi1-003.exp0.box" python generate_line_box.py -i "data/ground-truth/marathi1-004.exp0.tif" -t "data/ground-truth/marathi1-004.exp0.gt.txt" > "data/ground-truth/marathi1-004.exp0.box" python generate_line_box.py -i "data/ground-truth/marathi1-005.exp0.tif" -t "data/ground-truth/marathi1-005.exp0.gt.txt" > "data/ground-truth/marathi1-005.exp0.box" python generate_line_box.py -i "data/ground-truth/marathi1-006.exp0.tif" -t "data/ground-truth/marathi1-006.exp0.gt.txt" > "data/ground-truth/marathi1-006.exp0.box" python generate_line_box.py -i "data/ground-truth/marathi1-007.exp0.tif" -t "data/ground-truth/marathi1-007.exp0.gt.txt" > "data/ground-truth/marathi1-007.exp0.box" python generate_line_box.py -i "data/ground-truth/marathi1-008.exp0.tif" -t "data/ground-truth/marathi1-008.exp0.gt.txt" > "data/ground-truth/marathi1-008.exp0.box" python generate_line_box.py -i "data/ground-truth/marathi1-009.exp0.tif" -t "data/ground-truth/marathi1-009.exp0.gt.txt" > "data/ground-truth/marathi1-009.exp0.box" python generate_line_box.py -i "data/ground-truth/marathi1-010.exp0.tif" -t "data/ground-truth/marathi1-010.exp0.gt.txt" > "data/ground-truth/marathi1-010.exp0.box" find data/ground-truth -name '*.box' -exec cat {} \; > "data/all-boxes" unicharset_extractor --output_unicharset "data/unicharset" --norm_mode 1 "data/all-boxes" Extracting unicharset from box file data/all-boxes Wrote unicharset file data/unicharset tesseract data/ground-truth/marathi1-001.exp0.tif data/ground-truth/marathi1-001.exp0 --psm 6 lstm.train Tesseract Open Source OCR Engine v4.0.0 with Leptonica Page 1 tesseract data/ground-truth/marathi1-002.exp0.tif data/ground-truth/marathi1-002.exp0 --psm 6 lstm.train Tesseract Open Source OCR Engine v4.0.0 with Leptonica Page 1 tesseract data/ground-truth/marathi1-003.exp0.tif data/ground-truth/marathi1-003.exp0 --psm 6 lstm.train Tesseract Open Source OCR Engine v4.0.0 with Leptonica Page 1 tesseract data/ground-truth/marathi1-004.exp0.tif data/ground-truth/marathi1-004.exp0 --psm 6 lstm.train Tesseract Open Source OCR Engine v4.0.0 with Leptonica Page 1 tesseract data/ground-truth/marathi1-005.exp0.tif data/ground-truth/marathi1-005.exp0 --psm 6 lstm.train Tesseract Open Source OCR Engine v4.0.0 with Leptonica Page 1 tesseract data/ground-truth/marathi1-006.exp0.tif data/ground-truth/marathi1-006.exp0 --psm 6 lstm.train Tesseract Open Source OCR Engine v4.0.0 with Leptonica Page 1 tesseract data/ground-truth/marathi1-007.exp0.tif data/ground-truth/marathi1-007.exp0 --psm 6 lstm.train Tesseract Open Source OCR Engine v4.0.0 with Leptonica Page 1 tesseract data/ground-truth/marathi1-008.exp0.tif data/ground-truth/marathi1-008.exp0 --psm 6 lstm.train Tesseract Open Source OCR Engine v4.0.0 with Leptonica Page 1 tesseract data/ground-truth/marathi1-009.exp0.tif data/ground-truth/marathi1-009.exp0 --psm 6 lstm.train Tesseract Open Source OCR Engine v4.0.0 with Leptonica Page 1 tesseract data/ground-truth/marathi1-010.exp0.tif data/ground-truth/marathi1-010.exp0 --psm 6 lstm.train Tesseract Open Source OCR Engine v4.0.0 with Leptonica Page 1 find data/ground-truth -name '*.lstmf' -exec echo {} \; | sort -R -o "data/all-lstmf" total=cat data/all-lstmf | wc -l\ no=echo "$total 0.90 / 1" | bc; \ head -n "$no" data/all-lstmf > "data/list.train" total=cat data/all-lstmf | wc -l\ no=echo "($total - $total 0.90) / 1" | bc; \ tail -n "$no" data/all-lstmf > "data/list.eval" combine_lang_model \ --input_unicharset data/unicharset \ --script_dir data/ \ --output_dir data/ \ --lang foo Loaded unicharset of size 29 from file data/unicharset Setting unichar properties Setting script properties Failed to load script unicharset from:data//Devanagari.unicharset Warning: properties incomplete for index 3 = ग Warning: properties incomplete for index 4 = ण Warning: properties incomplete for index 5 = ज Warning: properties incomplete for index 6 = र Warning: properties incomplete for index 7 = म Warning: properties incomplete for index 8 = न Warning: properties incomplete for index 9 = क Warning: properties incomplete for index 11 = व Warning: properties incomplete for index 12 = उ Warning: properties incomplete for index 13 = ळ Warning: properties incomplete for index 14 = घ Warning: properties incomplete for index 15 = ड Warning: properties incomplete for index 16 = ए Warning: properties incomplete for index 17 = अ Warning: properties incomplete for index 18 = ह Warning: properties incomplete for index 19 = द Warning: properties incomplete for index 20 = ब Warning: properties incomplete for index 21 = ल Warning: properties incomplete for index 23 = प Warning: properties incomplete for index 24 = ट Warning: properties incomplete for index 25 = च Warning: properties incomplete for index 26 = त Warning: properties incomplete for index 27 = ध Warning: properties incomplete for index 28 = फ Config file is optional, continuing... Failed to read data from: data//foo/foo.config Null char=2 mkdir -p data/checkpoints lstmtraining \ --traineddata data/foo/foo.traineddata \ --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1chead -n1 data/unicharset`]" \ --model_output data/checkpoints/foo \ --learning_rate 20e-4 \ --train_listfile data/list.train \ --eval_listfile data/list.eval \ --max_iterations 10000 Warning: given outputs 29 not equal to unicharset of 28. Num outputs,weights in Series: 1,36,0,1:1, 0 Num outputs,weights in Series: C3,3:9, 0 Ft16:16, 160 Total weights = 160 [C3,3Ft16]:16, 160 Mp3,3:16, 0 Lfys48:48, 12480 Lfx96:96, 55680 Lrx96:96, 74112 Lfx256:256, 361472 Fc28:28, 7196 Total weights = 511100 Built network:[1,36,0,1[C3,3Ft16]Mp3,3Lfys48Lfx96Lrx96Lfx256Fc28] from request [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c29] Training parameters: Debug interval = 0, weights = 0.1, learning rate = 0.002, momentum=0.5 null char=27 Loaded 16/16 pages (1-16) of document data/ground-truth/marathi1-001.exp0.lstmf Loaded 14/14 pages (1-14) of document data/ground-truth/marathi1-008.exp0.lstmf Loaded 14/14 pages (1-14) of document data/ground-truth/marathi1-005.exp0.lstmf Loaded 21/21 pages (1-21) of document data/ground-truth/marathi1-007.exp0.lstmf Loaded 21/21 pages (1-21) of document data/ground-truth/marathi1-010.exp0.lstmf Loaded 25/25 pages (1-25) of document data/ground-truth/marathi1-009.exp0.lstmf Loaded 24/24 pages (1-24) of document data/ground-truth/marathi1-003.exp0.lstmf Loaded 26/26 pages (1-26) of document data/ground-truth/marathi1-002.exp0.lstmf Loaded 21/21 pages (1-21) of document data/ground-truth/marathi1-004.exp0.lstmf Loaded 25/25 pages (1-25) of document data/ground-truth/marathi1-006.exp0.lstmf At iteration 99/100/100, Mean rms=7.226%, delta=3.012%, char train=267%, word train=95%, skip ratio=0%, New worst char error = 267 wrote checkpoint.

At iteration 199/200/200, Mean rms=7.06%, delta=2.755%, char train=189.5%, word train=95.5%, skip ratio=0%, New worst char error = 189.5 wrote checkpoint.

At iteration 290/300/300, Mean rms=7.036%, delta=2.443%, char train=164.667%, word train=96.667%, skip ratio=0%, New worst char error = 164.667 wrote checkpoint.

At iteration 387/400/400, Mean rms=7.074%, delta=2.502%, char train=160.5%, word train=96.75%, skip ratio=0%, New worst char error = 160.5 wrote checkpoint.

At iteration 480/500/500, Mean rms=7.07%, delta=2.291%, char train=148.6%, word train=97.4%, skip ratio=0%, New worst char error = 148.6 wrote checkpoint.

At iteration 566/600/600, Mean rms=7.111%, delta=2.129%, char train=140.5%, word train=97.833%, skip ratio=0%, New worst char error = 140.5 wrote checkpoint.

At iteration 644/700/700, Mean rms=7.153%, delta=2.117%, char train=135.786%, word train=98.143%, skip ratio=0%, New worst char error = 135.786 wrote checkpoint.

At iteration 707/800/800, Mean rms=7.186%, delta=2.221%, char train=135.375%, word train=98.375%, skip ratio=0%, New worst char error = 135.375 wrote checkpoint.

At iteration 779/900/900, Mean rms=7.204%, delta=2.397%, char train=140.944%, word train=98.556%, skip ratio=0%, New worst char error = 140.944 wrote checkpoint.

At iteration 863/1000/1000, Mean rms=7.187%, delta=2.838%, char train=154.15%, word train=98.6%, skip ratio=0%, New worst char error = 154.15 wrote checkpoint.

At iteration 956/1100/1100, Mean rms=7.088%, delta=3.46%, char train=147.1%, word train=98.4%, skip ratio=0%, New worst char error = 147.1 wrote checkpoint.

At iteration 1023/1200/1200, Mean rms=7.12%, delta=3.307%, char train=150.85%, word train=98.8%, skip ratio=0%, New worst char error = 150.85 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 1110/1300/1300, Mean rms=7.109%, delta=3.743%, char train=161.15%, word train=98.6%, skip ratio=0%, New worst char error = 161.15 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 1198/1400/1400, Mean rms=7.1%, delta=4.097%, char train=180.9%, word train=98.9%, skip ratio=0%, New worst char error = 180.9At iteration 1023, stage 0, Eval Char error rate=100, Word error rate=100 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 1297/1500/1500, Mean rms=7.025%, delta=4.76%, char train=194.95%, word train=98.8%, skip ratio=0%, New worst char error = 194.95At iteration 1110, stage 0, Eval Char error rate=498.07692, Word error rate=100 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 1394/1600/1600, Mean rms=6.898%, delta=5.583%, char train=216.4%, word train=98.8%, skip ratio=0%, New worst char error = 216.4At iteration 1198, stage 0, Eval Char error rate=401.92308, Word error rate=100 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 1481/1700/1700, Mean rms=6.778%, delta=6.239%, char train=242.95%, word train=98.8%, skip ratio=0%, New worst char error = 242.95At iteration 1297, stage 0, Eval Char error rate=169.23077, Word error rate=100 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 1553/1800/1800, Mean rms=6.624%, delta=6.812%, char train=251.35%, word train=98.1%, skip ratio=0%, New worst char error = 251.35At iteration 1394, stage 0, Eval Char error rate=426.92308, Word error rate=100 wrote checkpoint.

At iteration 1628/1900/1900, Mean rms=6.422%, delta=7.195%, char train=243.8%, word train=96.9%, skip ratio=0%, wrote checkpoint.

At iteration 1715/2000/2000, Mean rms=6.287%, delta=7.481%, char train=231.55%, word train=95.7%, skip ratio=0%, wrote checkpoint.

At iteration 1812/2100/2100, Mean rms=6.187%, delta=7.493%, char train=222.05%, word train=95.3%, skip ratio=0%, wrote checkpoint.

At iteration 1910/2200/2200, Mean rms=5.954%, delta=8.482%, char train=215.4%, word train=93.4%, skip ratio=0%, wrote checkpoint.

At iteration 2010/2300/2300, Mean rms=5.696%, delta=8.712%, char train=203.35%, word train=92.2%, skip ratio=0%, wrote checkpoint.

At iteration 2109/2400/2400, Mean rms=5.341%, delta=8.584%, char train=180.55%, word train=90.7%, skip ratio=0%, wrote checkpoint.

At iteration 2209/2500/2500, Mean rms=4.999%, delta=8.082%, char train=166.15%, word train=89.5%, skip ratio=0%, wrote checkpoint.

At iteration 2309/2600/2600, Mean rms=4.668%, delta=7.392%, char train=144.55%, word train=87.9%, skip ratio=0%, wrote checkpoint.

At iteration 2409/2700/2700, Mean rms=4.319%, delta=6.791%, char train=116.85%, word train=86%, skip ratio=0%, wrote checkpoint.

At iteration 2509/2800/2800, Mean rms=3.999%, delta=6.177%, char train=104.55%, word train=84.6%, skip ratio=0%, wrote checkpoint.

At iteration 2607/2900/2900, Mean rms=3.959%, delta=6.183%, char train=116.2%, word train=85.2%, skip ratio=0%, wrote checkpoint.

At iteration 2706/3000/3000, Mean rms=3.995%, delta=5.89%, char train=121.1%, word train=86.1%, skip ratio=0%, wrote checkpoint.

At iteration 2794/3100/3100, Mean rms=4.082%, delta=5.492%, char train=135%, word train=86.5%, skip ratio=0%, wrote checkpoint.

At iteration 2886/3200/3200, Mean rms=4.173%, delta=5.168%, char train=145.8%, word train=87.8%, skip ratio=0%, wrote checkpoint.

At iteration 2966/3300/3300, Mean rms=4.274%, delta=4.768%, char train=147.05%, word train=87.7%, skip ratio=0%, wrote checkpoint.

At iteration 3013/3400/3400, Mean rms=4.391%, delta=4.429%, char train=142.05%, word train=87.6%, skip ratio=0%, wrote checkpoint.

At iteration 3084/3500/3500, Mean rms=4.628%, delta=4.723%, char train=140.35%, word train=87%, skip ratio=0%, wrote checkpoint.

At iteration 3183/3600/3600, Mean rms=4.839%, delta=5.391%, char train=137.25%, word train=87.1%, skip ratio=0%, wrote checkpoint.

At iteration 3278/3700/3700, Mean rms=4.998%, delta=5.822%, char train=138.65%, word train=87.1%, skip ratio=0%, wrote checkpoint.

At iteration 3376/3800/3800, Mean rms=5.113%, delta=6.118%, char train=138.6%, word train=87.5%, skip ratio=0%, wrote checkpoint.

At iteration 3473/3900/3900, Mean rms=4.971%, delta=5.798%, char train=126.35%, word train=86.3%, skip ratio=0%, wrote checkpoint.

At iteration 3571/4000/4000, Mean rms=4.692%, delta=5.524%, char train=116.6%, word train=85.3%, skip ratio=0%, wrote checkpoint.

At iteration 3659/4100/4100, Mean rms=4.75%, delta=6.248%, char train=119.7%, word train=85.5%, skip ratio=0%, wrote checkpoint.

At iteration 3752/4200/4200, Mean rms=4.531%, delta=5.832%, char train=112.7%, word train=84.5%, skip ratio=0%, wrote checkpoint.

At iteration 3845/4300/4300, Mean rms=4.32%, delta=5.683%, char train=111%, word train=84.5%, skip ratio=0%, wrote checkpoint.

At iteration 3942/4400/4400, Mean rms=4.164%, delta=5.861%, char train=114.7%, word train=84.7%, skip ratio=0%, wrote checkpoint.

At iteration 4039/4500/4500, Mean rms=3.936%, delta=5.578%, char train=115.05%, word train=84.6%, skip ratio=0%, wrote checkpoint.

At iteration 4135/4600/4600, Mean rms=3.752%, delta=4.956%, char train=118.75%, word train=84.9%, skip ratio=0%, wrote checkpoint.

At iteration 4234/4700/4700, Mean rms=3.61%, delta=4.569%, char train=116.45%, word train=84.9%, skip ratio=0%, wrote checkpoint.

At iteration 4334/4800/4800, Mean rms=3.518%, delta=4.338%, char train=116.95%, word train=84.8%, skip ratio=0%, wrote checkpoint.

At iteration 4434/4900/4900, Mean rms=3.452%, delta=4.207%, char train=116.65%, word train=84.7%, skip ratio=0%, wrote checkpoint.

At iteration 4534/5000/5000, Mean rms=3.411%, delta=4.13%, char train=115.6%, word train=84.3%, skip ratio=0%, wrote checkpoint.

2 Percent improvement time=4634, best error was 100 @ 0 Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 4634/5100/5100, Mean rms=3.028%, delta=3.178%, char train=99.1%, word train=83.4%, skip ratio=0%, New best char error = 99.1At iteration 1481, stage 0, Eval Char error rate=130.76923, Word error rate=100 wrote checkpoint.

2 Percent improvement time=4734, best error was 100 @ 0 At iteration 4734/5200/5200, Mean rms=2.955%, delta=3.145%, char train=97.15%, word train=83.4%, skip ratio=0%, New best char error = 97.15 wrote checkpoint.

At iteration 4834/5300/5300, Mean rms=2.915%, delta=3.138%, char train=98.65%, word train=83.8%, skip ratio=0%, New worst char error = 98.65 wrote checkpoint.

At iteration 4934/5400/5400, Mean rms=2.877%, delta=3.082%, char train=97.45%, word train=83.1%, skip ratio=0%, New worst char error = 97.45 wrote checkpoint.

At iteration 5034/5500/5500, Mean rms=2.852%, delta=3.031%, char train=99.25%, word train=83.5%, skip ratio=0%, New worst char error = 99.25 wrote checkpoint.

At iteration 5134/5600/5600, Mean rms=2.825%, delta=2.988%, char train=97.9%, word train=83%, skip ratio=0%, New worst char error = 97.9 wrote checkpoint.

At iteration 5234/5700/5700, Mean rms=2.807%, delta=2.946%, char train=100%, word train=83.1%, skip ratio=0%, New worst char error = 100 wrote checkpoint.

At iteration 5334/5800/5800, Mean rms=2.788%, delta=2.886%, char train=99.7%, word train=83.4%, skip ratio=0%, New worst char error = 99.7 wrote checkpoint.

At iteration 5434/5900/5900, Mean rms=2.771%, delta=2.819%, char train=99.85%, word train=83.8%, skip ratio=0%, New worst char error = 99.85 wrote checkpoint.

At iteration 5534/6000/6000, Mean rms=2.76%, delta=2.757%, char train=101.25%, word train=84.2%, skip ratio=0%, New worst char error = 101.25 wrote checkpoint.

At iteration 5634/6100/6100, Mean rms=2.747%, delta=2.709%, char train=101.25%, word train=84.5%, skip ratio=0%, New worst char error = 101.25 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 5734/6200/6200, Mean rms=2.735%, delta=2.655%, char train=101.05%, word train=84.4%, skip ratio=0%, New worst char error = 101.05At iteration 1553, stage 0, Eval Char error rate=267.30769, Word error rate=100 wrote checkpoint.

At iteration 5834/6300/6300, Mean rms=2.726%, delta=2.623%, char train=100.6%, word train=84.3%, skip ratio=0%, wrote checkpoint.

At iteration 5934/6400/6400, Mean rms=2.716%, delta=2.594%, char train=100.95%, word train=84.7%, skip ratio=0%, wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 6034/6500/6500, Mean rms=2.721%, delta=2.591%, char train=101.1%, word train=84.8%, skip ratio=0%, New worst char error = 101.1At iteration 4734, stage 0, Eval Char error rate=151.92308, Word error rate=100 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 6134/6600/6600, Mean rms=2.718%, delta=2.578%, char train=102.95%, word train=84.9%, skip ratio=0%, New worst char error = 102.95At iteration 5734, stage 0, Eval Char error rate=176.92308, Word error rate=100 wrote checkpoint.

At iteration 6234/6700/6700, Mean rms=2.729%, delta=2.585%, char train=102.05%, word train=85%, skip ratio=0%, wrote checkpoint.

At iteration 6334/6800/6800, Mean rms=2.73%, delta=2.581%, char train=102.9%, word train=85.4%, skip ratio=0%, wrote checkpoint.

At iteration 6434/6900/6900, Mean rms=2.743%, delta=2.604%, char train=102.75%, word train=85.2%, skip ratio=0%, wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 6534/7000/7000, Mean rms=2.742%, delta=2.593%, char train=103.15%, word train=85.6%, skip ratio=0%, New worst char error = 103.15At iteration 6034, stage 0, Eval Char error rate=176.92308, Word error rate=100 wrote checkpoint.

At iteration 6634/7100/7100, Mean rms=2.746%, delta=2.596%, char train=102.05%, word train=84.7%, skip ratio=0%, wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 6734/7200/7200, Mean rms=2.747%, delta=2.59%, char train=103.4%, word train=85.3%, skip ratio=0%, New worst char error = 103.4At iteration 6134, stage 0, Eval Char error rate=176.92308, Word error rate=100 wrote checkpoint.

At iteration 6834/7300/7300, Mean rms=2.753%, delta=2.6%, char train=102.8%, word train=84.7%, skip ratio=0%, wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 6934/7400/7400, Mean rms=2.76%, delta=2.614%, char train=103.9%, word train=85.1%, skip ratio=0%, New worst char error = 103.9At iteration 6534, stage 0, Eval Char error rate=176.92308, Word error rate=100 wrote checkpoint.

At iteration 7034/7500/7500, Mean rms=2.76%, delta=2.613%, char train=103.85%, word train=85%, skip ratio=0%, wrote checkpoint.

At iteration 7134/7600/7600, Mean rms=2.771%, delta=2.633%, char train=103.45%, word train=85%, skip ratio=0%, wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 7234/7700/7700, Mean rms=2.77%, delta=2.628%, char train=104.55%, word train=85.5%, skip ratio=0%, New worst char error = 104.55At iteration 6734, stage 0, Eval Char error rate=176.92308, Word error rate=100 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 7334/7800/7800, Mean rms=2.779%, delta=2.647%, char train=104.85%, word train=85.2%, skip ratio=0%, New worst char error = 104.85At iteration 6934, stage 0, Eval Char error rate=165.38462, Word error rate=100 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 7434/7900/7900, Mean rms=2.775%, delta=2.633%, char train=105.7%, word train=85.8%, skip ratio=0%, New worst char error = 105.7At iteration 7234, stage 0, Eval Char error rate=190.38462, Word error rate=100 wrote checkpoint.

At iteration 7534/8000/8000, Mean rms=2.782%, delta=2.656%, char train=105.2%, word train=84.4%, skip ratio=0%, wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 7634/8100/8100, Mean rms=2.781%, delta=2.649%, char train=106.8%, word train=85.1%, skip ratio=0%, New worst char error = 106.8At iteration 7334, stage 0, Eval Char error rate=213.46154, Word error rate=100 wrote checkpoint.

At iteration 7734/8200/8200, Mean rms=2.785%, delta=2.665%, char train=106.35%, word train=84.3%, skip ratio=0%, wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 7834/8300/8300, Mean rms=2.788%, delta=2.664%, char train=108.05%, word train=84.8%, skip ratio=0%, New worst char error = 108.05At iteration 7434, stage 0, Eval Char error rate=144.23077, Word error rate=100 wrote checkpoint.

At iteration 7934/8400/8400, Mean rms=2.794%, delta=2.677%, char train=107.4%, word train=84.1%, skip ratio=0%, wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 8034/8500/8500, Mean rms=2.799%, delta=2.682%, char train=109.45%, word train=84.9%, skip ratio=0%, New worst char error = 109.45At iteration 7634, stage 0, Eval Char error rate=76.923077, Word error rate=76.923077 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 8134/8600/8600, Mean rms=2.796%, delta=2.669%, char train=110.05%, word train=85%, skip ratio=0%, New worst char error = 110.05At iteration 7834, stage 0, Eval Char error rate=128.84615, Word error rate=96.153846 wrote checkpoint.

At iteration 8234/8700/8700, Mean rms=2.792%, delta=2.663%, char train=109.35%, word train=84.5%, skip ratio=0%, wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 8334/8800/8800, Mean rms=2.898%, delta=3.043%, char train=111.75%, word train=84.7%, skip ratio=0%, New worst char error = 111.75At iteration 8034, stage 0, Eval Char error rate=76.923077, Word error rate=76.923077 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 8434/8900/8900, Mean rms=3.051%, delta=3.584%, char train=119.75%, word train=85.3%, skip ratio=0%, New worst char error = 119.75At iteration 8134, stage 0, Eval Char error rate=194.23077, Word error rate=100 wrote checkpoint.

Warning: LSTMTrainer deserialized an LSTMRecognizer! At iteration 8533/9000/9000, Mean rms=3.331%, delta=4.42%, char train=127.6%, word train=86.9%, skip ratio=0%, New worst char error = 127.6At iteration 8334, stage 0, Eval Char error rate=150, Word error rate=100 wrote checkpoint.

At iteration 8632/9100/9100, Mean rms=3.371%, delta=4.523%, char train=126.95%, word train=87.5%, skip ratio=0%, wrote checkpoint.

At iteration 8732/9200/9200, Mean rms=3.364%, delta=4.511%, char train=126.25%, word train=87.9%, skip ratio=0%, wrote checkpoint.

At iteration 8832/9300/9300, Mean rms=3.35%, delta=4.496%, char train=124.9%, word train=87.4%, skip ratio=0%, wrote checkpoint.

At iteration 8932/9400/9400, Mean rms=3.35%, delta=4.518%, char train=125.5%, word train=87.9%, skip ratio=0%, wrote checkpoint.

At iteration 9031/9500/9500, Mean rms=3.383%, delta=4.664%, char train=124.6%, word train=87.3%, skip ratio=0%, wrote checkpoint.

At iteration 9130/9600/9600, Mean rms=3.393%, delta=4.727%, char train=124.5%, word train=87.2%, skip ratio=0%, wrote checkpoint.

At iteration 9228/9700/9700, Mean rms=3.401%, delta=4.76%, char train=124.95%, word train=87.4%, skip ratio=0%, wrote checkpoint.

At iteration 9328/9800/9800, Mean rms=3.293%, delta=4.384%, char train=122.7%, word train=86.8%, skip ratio=0%, wrote checkpoint.

At iteration 9428/9900/9900, Mean rms=3.14%, delta=3.847%, char train=113.95%, word train=85.6%, skip ratio=0%, wrote checkpoint.

At iteration 9528/10000/10000, Mean rms=2.847%, delta=2.985%, char train=106.2%, word train=85.3%, skip ratio=0%, wrote checkpoint.

Finished! Error rate = 97.15 `

after performing ocr using newly trained data it gives following output text for sample image

`

नर

टन

नरण

टन

टन

न र

नट `

please suggest what is going wrong as to improve ocr accuracy.

jayawantkarale commented 5 years ago

This issue is resolved by giving Devanagari.unicharset and Latin.unicharset file while training. Also give large sample data to train images (more than 10 images). we have tried for english.