Open Shreeshrii opened 7 years ago
The images used were created by text2image with training text with word wrap which ran for full width of page.
Is there a limit to size of images for training?
Should training text only to be 70-120 characters wide?
This is the opposite case of image being too small.
Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx256Fc104] from request [1,0,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c5000]
Training parameters:
Debug interval = 0, weights = 0.1, learning rate = 0.0001, momentum=0.9
Loaded 151/151 pages (1-151) of document /home/shree/tesstutorial/trado/ara.Traditional_Arabic.exp0.lstmf
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
At iteration 100/100/104, Mean rms=6.004%, delta=48.481%, char train=138.814%, word train=100%, skip ratio=4%, New worst char error = 138.814 wrote checkpoint.
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
At iteration 200/200/207, Mean rms=5.654%, delta=40.983%, char train=119.407%, word train=100%, skip ratio=3.5%, New worst char error = 119.407 wrote checkpoint.
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Is there a limit to size of images for training?
https://github.com/tesseract-ocr/tesseract/blob/ce76d1c569/lstm/lstmrecognizer.cpp#L266
// Maximum width of image to train on. const int kMaxImageWidth = 2560;
Then shouldn't text2image ensure that images are made to fit that width.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Jan 9, 2017 at 3:20 PM, Amit D. notifications@github.com wrote:
Is there a limit to size of images for training?
https://github.com/tesseract-ocr/tesseract/blob/ce76d1c569/ lstm/lstmrecognizer.cpp#L266
https://github.com/tesseract-ocr/tesseract/blob/ce76d1c569/ lstm/lstmrecognizer.cpp#L266
// Maximum width of image to train on. const int kMaxImageWidth = 2560;
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-271244655, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_oyLDWu_QZFaYM9Kn1mCaW7ExTo-_ks5rQgLtgaJpZM4LQsPF .
Yes :-)
// Width of output image (in pixels). INT_PARAM_FLAG(xsize, 3600, "Width of output image");
The default value for images output by text2image can be reduced during running tesstrain.sh by modifying tesstrain_utils.sh
common_args+=" --leading=${LEADING} --xsize 2550"
@theraysmith
Ray,
// Maximum width of image to train on. const int kMaxImageWidth = 2560;
I have some old tif/box pairs . the image width is 4000.
Will training quality be degraded if changing above constant to 4000 in order to use them?
Also can this be changed during runtime with a variable or do I need to recompile tesseract with the higher value?
Changing tesstrain_utils.sh for
common_args+=" --leading=${LEADING} --xsize 2550"
fixes this.
@Shreeshrii how can the problem of image being too small be fixed?
Usually this happens for just a few lines of an image - tesseract splits the input image into separate image per line.
It could be when layout analysis has wrongly segmented the page or a line has been detected as having hundreds of diacritics.
If it is just a few messages, you could ignore.
@theraysmith Any update regarding new line detection algorithm?
actually, it's not just a few messages. I am trying to train tesseract to recognize plate licence, and the prepared training_text is just like a plate licence. something like this: ۵۴ ۷۲۸ ب ۱۴ each line includes one of these patterns. I received a lot of these errors and the training process finished with error rate equal to zero. no training! would you please help me to figure out what the problem is?
On Wed, Aug 9, 2017 at 8:02 AM, Shreeshrii notifications@github.com wrote:
Reopened #590 https://github.com/tesseract-ocr/tesseract/issues/590.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#event-1198365561, or mute the thread https://github.com/notifications/unsubscribe-auth/AZFiARloL1SxhhVagWDBpNPsl8wmxGH3ks5sWSgzgaJpZM4LQsPF .
Image too large to learn!! Size = 2758x48 Image not trainable
@hanikh, please paste a short example for the errors you get.
The exact error message would greatly help diagnose the problem.
On Tue, Aug 8, 2017 at 10:28 PM, Amit D. notifications@github.com wrote:
Image too large to learn!! Size = 2758x48 Image not trainable
@hanikh https://github.com/hanikh, please paste a short example for the errors you get.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-321156352, or mute the thread https://github.com/notifications/unsubscribe-auth/AL056TBM3518EXdJE7-KA44mvwgN2Mx2ks5sWUNhgaJpZM4LQsPF .
-- Ray.
I will send the exact error message as soon as possible. but, meanwhile I have faced a more important problem. I finetuned tesseract for farsi (40 fonts on 6000 text lines) and I got worse result than the original tesserct on the trained fonts. what is the problem? the training_text is not big enough? (this is a different project and not related to the licence plate)
On Thu, Aug 10, 2017 at 11:17 PM, theraysmith notifications@github.com wrote:
The exact error message would greatly help diagnose the problem.
On Tue, Aug 8, 2017 at 10:28 PM, Amit D. notifications@github.com wrote:
Image too large to learn!! Size = 2758x48 Image not trainable
@hanikh https://github.com/hanikh, please paste a short example for the errors you get.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590# issuecomment-321156352, or mute the thread https://github.com/notifications/unsubscribe-auth/AL056TBM3518EXdJE7- KA44mvwgN2Mx2ks5sWUNhgaJpZM4LQsPF .
-- Ray.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-321639717, or mute the thread https://github.com/notifications/unsubscribe-auth/AZFiAQuqzKOKd8bmnzUcFlsc6bPQth3Oks5sW1AzgaJpZM4LQsPF .
@hanikh did you used v4? i saw this problem on cube for persian..
@theraysmith would you please help me, how many text line is appropriate? thanks
I finetuned tesseract for farsi (40 fonts on 6000 text lines)
I think this maybe too much for finetuning.
I noticed that tesstrain.sh is limiting text2image generated images to just 3 pages - that would be only max 150 lines per font.
With that much input, you can try replace a layer training to see if that gets you better results.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, Aug 12, 2017 at 3:27 PM, hanikh notifications@github.com wrote:
@theraysmith https://github.com/theraysmith would you please help me, how many text line is appropriate? thanks
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-321970660, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_o4rV-DPLTiSAqgSTy9dJdA3Oek6iks5sXXcJgaJpZM4LQsPF .
@hanikh I suggest to wait till Ray updates the langdata and also uploads the new version of unichar_extractor. Befroe that training for RTL languages may not be give useful results.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, Aug 12, 2017 at 4:04 PM, ShreeDevi Kumar shreeshrii@gmail.com wrote:
I finetuned tesseract for farsi (40 fonts on 6000 text lines)
I think this maybe too much for finetuning.
I noticed that tesstrain.sh is limiting text2image generated images to just 3 pages - that would be only max 150 lines per font.
With that much input, you can try replace a layer training to see if that gets you better results.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, Aug 12, 2017 at 3:27 PM, hanikh notifications@github.com wrote:
@theraysmith https://github.com/theraysmith would you please help me, how many text line is appropriate? thanks
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-321970660, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_o4rV-DPLTiSAqgSTy9dJdA3Oek6iks5sXXcJgaJpZM4LQsPF .
Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable 2 Percent improvement time=0, best error was 2.167 @ 14 At iteration 14/1100/20884, Mean rms=0.049%, delta=0%, char train=0%, word train=0%, skip ratio=1798.6%, New best char error = 0 wrote best model:/home/fanasa/tesstutorial/fastuned_from_fas/fastuned-plates0_14.lstm wrote checkpoint.
Finished! Error rate = 0 this is the error I got during training for licence plates.
Initial problem: (Image too small to scale) Those images are ridiculously small at 3x48 pixels. Something is going wrong somewhere with the images. Are they oriented vertically? The input scaling scales the height to 48, whatever it starts as, so it looks like your textlines are vertical.
Fine tuning problem: The problem is most likely too many iterations. It will hone its accuracy to whatever training data you give it if you run it for too many iterations. See how few iterations are used in the training tutorial for fine tuning.
On Sat, Aug 12, 2017 at 5:19 AM, hanikh notifications@github.com wrote:
Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable 2 Percent improvement time=0, best error was 2.167 @ 14 At iteration 14/1100/20884, Mean rms=0.049%, delta=0%, char train=0%, word train=0%, skip ratio=1798.6%, New best char error = 0 wrote best model:/home/fanasa/tesstutorial/fastuned_from_fas/fastuned-plates0_14.lstm wrote checkpoint.
Finished! Error rate = 0 this is the error I got during training for licence plates.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-321977639, or mute the thread https://github.com/notifications/unsubscribe-auth/AL056ZvLnyg_aC1mUg2gH34puAGpWdOOks5sXZhHgaJpZM4LQsPF .
-- Ray.
Ray,
I have seen line too small to be recognized when building box/tiff pairs using tesstrain.sh - it is usually related to 'nnn diacritics found' - so it may be related to accents being treated as a separate line.
Regarding finetuning, I have experimented a lot with Devanagari - with smaller number of iterations, the reported error rate is higher. And it takes tens of thosands of iterations for it to get more accuracy on training set - not sure of its effect on samples it has not seen. - see https://github.com/Shreeshrii/tess4training/blob/master/README.md
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sun, Aug 13, 2017 at 9:44 AM, theraysmith notifications@github.com wrote:
Initial problem: (Image too small to scale) Those images are ridiculously small at 3x48 pixels. Something is going wrong somewhere with the images. Are they oriented vertically? The input scaling scales the height to 48, whatever it starts as, so it looks like your textlines are vertical.
Fine tuning problem: The problem is most likely too many iterations. It will hone its accuracy to whatever training data you give it if you run it for too many iterations. See how few iterations are used in the training tutorial for fine tuning.
On Sat, Aug 12, 2017 at 5:19 AM, hanikh notifications@github.com wrote:
Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable 2 Percent improvement time=0, best error was 2.167 @ 14 At iteration 14/1100/20884, Mean rms=0.049%, delta=0%, char train=0%, word train=0%, skip ratio=1798.6%, New best char error = 0 wrote best model:/home/fanasa/tesstutorial/fastunedfrom fas/fastuned-plates0_14.lstm wrote checkpoint.
Finished! Error rate = 0 this is the error I got during training for licence plates.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590# issuecomment-321977639, or mute the thread https://github.com/notifications/unsubscribe-auth/AL056ZvLnyg_ aC1mUg2gH34puAGpWdOOks5sXZhHgaJpZM4LQsPF .
-- Ray.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-322020794, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_o3ztjvMQKBue5JIqMU9Qrfx4ng_Mks5sXng2gaJpZM4LQsPF .
for the fine tuning problem: the error-rate reaches 0.017 at about 80000 iterations. so with few iterations like in tutorial, a low error-rate like 0.01 can not be achieved. so you think fine tuning is a wrong solution and I should try replacing some layers? as I said before I am trying to train for 40 Persian fonts and they are so common.
On Sun, Aug 13, 2017 at 9:38 AM, Shreeshrii notifications@github.com wrote:
Ray,
I have seen line too small to be recognized when building box/tiff pairs using tesstrain.sh - it is usually related to 'nnn diacritics found' - so it may be related to accents being treated as a separate line.
Regarding finetuning, I have experimented a lot with Devanagari - with smaller number of iterations, the reported error rate is higher. And it takes tens of thosands of iterations for it to get more accuracy on training set - not sure of its effect on samples it has not seen. - see https://github.com/Shreeshrii/tess4training/blob/master/README.md
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sun, Aug 13, 2017 at 9:44 AM, theraysmith notifications@github.com wrote:
Initial problem: (Image too small to scale) Those images are ridiculously small at 3x48 pixels. Something is going wrong somewhere with the images. Are they oriented vertically? The input scaling scales the height to 48, whatever it starts as, so it looks like your textlines are vertical.
Fine tuning problem: The problem is most likely too many iterations. It will hone its accuracy to whatever training data you give it if you run it for too many iterations. See how few iterations are used in the training tutorial for fine tuning.
On Sat, Aug 12, 2017 at 5:19 AM, hanikh notifications@github.com wrote:
Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable 2 Percent improvement time=0, best error was 2.167 @ 14 At iteration 14/1100/20884, Mean rms=0.049%, delta=0%, char train=0%, word train=0%, skip ratio=1798.6%, New best char error = 0 wrote best model:/home/fanasa/tesstutorial/fastunedfrom fas/fastuned-plates0_14.lstm wrote checkpoint.
Finished! Error rate = 0 this is the error I got during training for licence plates.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590# issuecomment-321977639, or mute the thread https://github.com/notifications/unsubscribe-auth/AL056ZvLnyg_ aC1mUg2gH34puAGpWdOOks5sXZhHgaJpZM4LQsPF .
-- Ray.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590# issuecomment-322020794, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_ o3ztjvMQKBue5JIqMU9Qrfx4ng_Mks5sXng2gaJpZM4LQsPF
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-322022245, or mute the thread https://github.com/notifications/unsubscribe-auth/AZFiAZCIts02B7U5JsRtn2DYu86ZBuyhks5sXoTKgaJpZM4LQsPF .
@Shreeshrii would you please explain about the new traineddata file? where can the lang.lstm-unicharset file be found ? how can combine_lang_model be used? thanks
On Mon, Aug 14, 2017 at 11:44 AM, Hanieh Khosravi hani.khosravi@gmail.com wrote:
for the fine tuning problem: the error-rate reaches 0.017 at about 80000 iterations. so with few iterations like in tutorial, a low error-rate like 0.01 can not be achieved. so you think fine tuning is a wrong solution and I should try replacing some layers? as I said before I am trying to train for 40 Persian fonts and they are so common.
On Sun, Aug 13, 2017 at 9:38 AM, Shreeshrii notifications@github.com wrote:
Ray,
I have seen line too small to be recognized when building box/tiff pairs using tesstrain.sh - it is usually related to 'nnn diacritics found' - so it may be related to accents being treated as a separate line.
Regarding finetuning, I have experimented a lot with Devanagari - with smaller number of iterations, the reported error rate is higher. And it takes tens of thosands of iterations for it to get more accuracy on training set - not sure of its effect on samples it has not seen. - see https://github.com/Shreeshrii/tess4training/blob/master/README.md
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sun, Aug 13, 2017 at 9:44 AM, theraysmith notifications@github.com wrote:
Initial problem: (Image too small to scale) Those images are ridiculously small at 3x48 pixels. Something is going wrong somewhere with the images. Are they oriented vertically? The input scaling scales the height to 48, whatever it starts as, so it looks like your textlines are vertical.
Fine tuning problem: The problem is most likely too many iterations. It will hone its accuracy to whatever training data you give it if you run it for too many iterations. See how few iterations are used in the training tutorial for fine tuning.
On Sat, Aug 12, 2017 at 5:19 AM, hanikh notifications@github.com wrote:
Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Compute CTC targets failed! Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable Image too small to scale!! (3x48 vs min width of 3) Line cannot be recognized!! Image not trainable 2 Percent improvement time=0, best error was 2.167 @ 14 At iteration 14/1100/20884, Mean rms=0.049%, delta=0%, char train=0%, word train=0%, skip ratio=1798.6%, New best char error = 0 wrote best model:/home/fanasa/tesstutorial/fastunedfrom fas/fastuned-plates0_14.lstm wrote checkpoint.
Finished! Error rate = 0 this is the error I got during training for licence plates.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590# issuecomment-321977639, or mute the thread https://github.com/notifications/unsubscribe-auth/AL056ZvLnyg_ aC1mUg2gH34puAGpWdOOks5sXZhHgaJpZM4LQsPF .
-- Ray.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issue comment-322020794, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_o3ztj vMQKBue5JIqMU9Qrfx4ng_Mks5sXng2gaJpZM4LQsPF
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-322022245, or mute the thread https://github.com/notifications/unsubscribe-auth/AZFiAZCIts02B7U5JsRtn2DYu86ZBuyhks5sXoTKgaJpZM4LQsPF .
where can the lang.lstm-unicharset file be found ?
combine_tessdata -u lang.traineddata lang.
It will create lang.* files , including the unicharset.
You can use dawg2wordlist to see the wordlist used
how can combine_lang_model be used?
combine_lang_model \
--input_unicharset ../tesstutorial/sanskrit2003/san/san.unicharset \
--script_dir "../langdata" \
--words "../langdata/san/san.wordlist" \
--numbers "../langdata/san/san.numbers" \
--puncs "../langdata/san/san.punc" \
--output_dir ../tesstutorial/sanskrit2003 \
--lang "san" --pass_through_recoder \
--version_str "4.0.0alpha-20170816 sanskrit2003"
For RTL languages, there is an additional flag. Please see https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain_utils.sh for details.
I used a hand-edited unicharset, because the unicharset generated from the current training process is old style
. You should wait for @theraysmith to update the unichar_extractor and other langdata files.
@Shreeshrii I want to train 40 fonts for Arabic and Farsi languages. I have tried to finetune the trained model, but I did not get a good result. I think that happened because the trained fonts were so different from mine. So now I am going to replace a layer. I want to replace just the last layer and I do not want to change the unicharset. So, can I use Arabic.traineddata as the traineddata file needed for training? these are the commands I am using: mkdir -p ~/tesstutorial/newara_from_ara training/combine_tessdata -e tessdata/best/Arabic.traineddata \ ~/tesstutorial/newara_from_ara/ara.lstm
training/lstmtraining --debug_interval 100 \ --continue_from ~/tesstutorial/newara_from_ara/ara.lstm \ --traineddata ~/tesstutorial/aratrain/ara/Arabic.traineddata \ --append_index 5 \ --model_output ~/tesstutorial/newara_from_ara/base \ --train_listfile ~/tesstutorial/aratrain/ara.training_files.txt \ --eval_listfile ~/tesstutorial/araeval/ara.training_files.txt \ --max_iterations 3000 &>~/tesstutorial/newara_from_ara/basetrain.log
@theraysmith
Please let us know whether it is worthwhile to try and train (finetune/replace layer) for RTL languages or should the users wait for your updates to langdata, unichar extractor programs.
@theraysmith if is it helpful i can provide a large amount of font and word list of persian and arabic language for the train material
@Shreeshrii would you please help me with using "replacing layers" as I asked before?
Need to wait for unichar_extractor to be fixed.
See https://github.com/tesseract-ocr/tesseract/issues/1114
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Sep 12, 2017 at 4:29 PM, hanikh notifications@github.com wrote:
@Shreeshrii https://github.com/shreeshrii would you please help me with using "replacing layers" as I asked before?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-328818316, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_o39Bj_jpcJV98kEOyDdv8rMKfFUvks5shmQbgaJpZM4LQsPF .
@roozgar have you tested the new traineddata for arabic? have you tried to train it?
@theraysmith Please see https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/QC3WY48SicI/ZococRbTBAAJ
regarding question about finetuning training for chi_sim.traineddata model
@Shreeshrii For RTL languages, there is an additional flag
you mean --lang_is_rtl
?
I think you need both
--pass_through_recoder \ --lang_is_rtl \
@Shreeshrii newest Tesseract 4.x issues --pass_through_recoder --lang_is_rtl
automatically while constructing the training data?
@christophered What are your results with training of RTL languages? I haven't had much success in finetuning.
@Shreeshrii I am conducting a couple of tests regarding such matter, later-on I will reply to you.
@Shreeshrii I have images all having the same dimension, yet tesseract errors after loading a few images correctly - what might be causing this, any idea?I'm printing the error log below :
Warning: given outputs 111 not equal to unicharset of 12.
Num outputs,weights in Series:
1,1000,60,1:1, 0
Num outputs,weights in Series:
C3,3:9, 0
Ft32:32, 320
Total weights = 320
[C3,3Ft32]:32, 320
Mp3,3:32, 0
Num outputs,weights in Series:
S333,20:213120, 0
Fr64:64, 13639744
Total weights = 13639744
[S333,20Fr64]:64, 13639744
Fc12:12, 780
Total weights = 13640844
Built network:[1,1000,60,1[C3,3Ft32]Mp3,3[S333,20Fr64]Fc12] from request [1,1000,60,1 Ct3,3,32 Mp3,3 Fr64 O1c111]
Training parameters:
Debug interval = 0, weights = 0.1, learning rate = 0.002, momentum=0.5
null char=11
Loaded 1/1 pages (1-1) of document digits.f0.exp0.lstmf
Loaded 1/1 pages (1-1) of document digits.f11.exp0.lstmf
Loaded 1/1 pages (1-1) of document digits.f13.exp0.lstmf
Loaded 1/1 pages (1-1) of document digits.f12.exp0.lstmf
Loaded 1/1 pages (1-1) of document digits.f14.exp0.lstmf
Loaded 1/1 pages (1-1) of document digits.f10.exp0.lstmf
Loaded 1/1 pages (1-1) of document digits.f16.exp0.lstmf
Loaded 1/1 pages (1-1) of document digits.f15.exp0.lstmf
Loaded 1/1 pages (1-1) of document digits.f17.exp0.lstmf
Image too large to learn!! Size = 15878x1000
Image not trainable
Loaded 1/1 pages (1-1) of document digits.f18.exp0.lstmf
Image too large to learn!! Size = 16660x1000
Image not trainable
Loaded 1/1 pages (1-1) of document digits.f19.exp0.lstmf
Image too large to learn!! Size = 16638x1000
Image not trainable
Loaded 1/1 pages (1-1) of document digits.f1.exp0.lstmf
Image too large to learn!! Size = 17289x1000
Image not trainable
Loaded 1/1 pages (1-1) of document digits.f20.exp0.lstmf
Image too large to learn!! Size = 16553x1000
Image not trainable
Loaded 1/1 pages (1-1) of document digits.f21.exp0.lstmf
Image too large to learn!! Size = 16978x1000
Image not trainable
There is a width limit for images. 2500+ don't remember exact number.
On Fri 6 Jul, 2018, 1:04 PM Soumik Ranjan Dasgupta, < notifications@github.com> wrote:
@Shreeshrii https://github.com/Shreeshrii I have images all having the same dimension, yet tesseract errors after loading a few images correctly - what might be causing this, any idea?I'm printing the error log below :
Warning: given outputs 111 not equal to unicharset of 12. Num outputs,weights in Series: 1,1000,60,1:1, 0 Num outputs,weights in Series: C3,3:9, 0 Ft32:32, 320 Total weights = 320 [C3,3Ft32]:32, 320 Mp3,3:32, 0 Num outputs,weights in Series: S333,20:213120, 0 Fr64:64, 13639744 Total weights = 13639744 [S333,20Fr64]:64, 13639744 Fc12:12, 780 Total weights = 13640844 Built network:[1,1000,60,1[C3,3Ft32]Mp3,3[S333,20Fr64]Fc12] from request [1,1000,60,1 Ct3,3,32 Mp3,3 Fr64 O1c111] Training parameters: Debug interval = 0, weights = 0.1, learning rate = 0.002, momentum=0.5 null char=11 Loaded 1/1 pages (1-1) of document digits.f0.exp0.lstmf Loaded 1/1 pages (1-1) of document digits.f11.exp0.lstmf Loaded 1/1 pages (1-1) of document digits.f13.exp0.lstmf Loaded 1/1 pages (1-1) of document digits.f12.exp0.lstmf Loaded 1/1 pages (1-1) of document digits.f14.exp0.lstmf Loaded 1/1 pages (1-1) of document digits.f10.exp0.lstmf Loaded 1/1 pages (1-1) of document digits.f16.exp0.lstmf Loaded 1/1 pages (1-1) of document digits.f15.exp0.lstmf Loaded 1/1 pages (1-1) of document digits.f17.exp0.lstmf Image too large to learn!! Size = 15878x1000 Image not trainable Loaded 1/1 pages (1-1) of document digits.f18.exp0.lstmf Image too large to learn!! Size = 16660x1000 Image not trainable Loaded 1/1 pages (1-1) of document digits.f19.exp0.lstmf Image too large to learn!! Size = 16638x1000 Image not trainable Loaded 1/1 pages (1-1) of document digits.f1.exp0.lstmf Image too large to learn!! Size = 17289x1000 Image not trainable Loaded 1/1 pages (1-1) of document digits.f20.exp0.lstmf Image too large to learn!! Size = 16553x1000 Image not trainable Loaded 1/1 pages (1-1) of document digits.f21.exp0.lstmf Image too large to learn!! Size = 16978x1000 Image not trainable
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-402952544, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_o7lCEojAbzX_rQgbOikicAEYw0mrks5uDxL2gaJpZM4LQsPF .
@Shreeshrii Here the max limit is 2560. I checked, my image dimensions are 60 x 1000 pixels, all of them. That is the reason I'm confused.
! Size = 16553x1000
On Fri 6 Jul, 2018, 1:22 PM Soumik Ranjan Dasgupta, < notifications@github.com> wrote:
@Shreeshrii https://github.com/Shreeshrii Here https://github.com/tesseract-ocr/tesseract/blob/ce76d1c569/lstm/lstmrecognizer.cpp#L266 the max limit is 2560. I checked, my image dimensions are 60 x 1000 pixels, all of them. That is the reason I'm confused.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-402956560, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_o61LlNFeLstarxoAtCFIj4F8sSJ-ks5uDxdUgaJpZM4LQsPF .
@Shreeshrii exactly! I'm preprocessing every image to rescale them (using PIL in Python3), still Tesseract seems to detect the width as 16553 or something like that. I'll recheck anyway. Let me take this opportunity to ask, is there a limit on maximum height as well? Also, what are the minimum values for width and height?
Probably being rotated...
I am not sure whether LSTM training will work with single character images.
On Fri 6 Jul, 2018, 1:29 PM Soumik Ranjan Dasgupta, < notifications@github.com> wrote:
@Shreeshrii https://github.com/Shreeshrii exactly! I'm preprocessing every image to rescale them (using PIL in Python3), still Tesseract seems to detect the width as 16553 or something like that. I'll recheck anyway. Let me take this opportunity to ask, is there a limit on maximum height as well? Also, what are the minimum values for width and height?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-402958168, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_ozPrnhuW9U7CXrO84kHwTyIVPWVzks5uDxjqgaJpZM4LQsPF .
@Shreeshrii I'm not intending to use LSTM to train Tesseract 4. I'm aiming to implement a CNN architecture - the code most probably has a bug. Kindly see #1748 for reference.
Seems to use max height of 48 for each line.
On Fri 6 Jul, 2018, 2:12 PM Soumik Ranjan Dasgupta, < notifications@github.com> wrote:
@Shreeshrii https://github.com/Shreeshrii I'm not intending to use LSTM to train Tesseract 4. I'm aiming to implement a CNN architecture - the code most probably has a bug. See #1748 https://github.com/tesseract-ocr/tesseract/issues/1748 for reference.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-402968933, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_ozf8XFA6-A1E6WA_7Ms3Nhj3ZUYYks5uDyMFgaJpZM4LQsPF .
@Shreeshrii
I haven't had much success in finetuning.
https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-322020794 by Ray Smith
Initial problem: (Image too small to scale) Those images are ridiculously small at 3x48 pixels. Something is going wrong somewhere with the images. Are they oriented vertically? The input scaling scales the height to 48, whatever it starts as, so it looks like your textlines are vertical.
This bug is still there.
Error in pixScaleAreaMap: pixd too small
Error in pixClone: pixs not defined
Error in pixCopyText: pixd not defined
Error in pixCopyInputFormat: pixd not defined
Scaling pix of size 35, 4548 by factor 0.0105541 made null pix!!
Error in pixGetWidth: pix not defined
Error in pixGetHeight: pix not defined
Bad pix from ImageData!
Line cannot be recognized!!
Image not trainable
with version
tesseract 4.0.0-beta.4-158-g02f9d
leptonica-1.76.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
I had the same problem as the thread OP:
Image too large to learn!! Size = 2594x48 Image not trainable
I resolved it with this suggestion above https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-273154535
Changing tesstrain_utils.sh for
common_args+=" --leading=${LEADING} --xsize 2550"
fixes this.
Was this the correct approach?
mkdir -p ~/tesstutorial/sanvedic lstmtraining -U ~/tesstutorial/vedic/san.unicharset \ --script_dir ../langdata --debug_interval 0 \ --learning_rate 10e-5 \ --net_spec '[1,0,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx384 O1c5000]' \ --net_mode 192 \ --perfect_sample_delay 19 \ --model_output ~/tesstutorial/sanvedic/base \ --train_listfile ~/tesstutorial/vedic/san.training_files.txt \ --eval_listfile ~/tesstutorial/vedic/san.training_files.txt \ --max_iterations 50000 \ &>~/tesstutorial/sanvedic/basetrain.log
Setting unichar properties Setting properties for script Common Setting properties for script Latin Setting properties for script Devanagari Unichar 2306=र्त्स्न्ये->र्त्स्न्ये is too long to encode!! Warning: given outputs 5000 not equal to unicharset of 5018. Num outputs,weights in serial: 1,0,0,1:1, 0 Num outputs,weights in serial: C5,5:25, 0 Ft16:16, 416 Total weights = 416 [C5,5Ft16]:16, 416 Mp3,3:16, 0 Lfys64:64, 20736 Lfx128:128, 98816 Lrx128:128, 131584 Lfx384:384, 787968 Fc5018:5018, 1931930 Total weights = 2971450 Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx384Fc5018] from request [1,0,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx384 O1c5000] Training parameters: Debug interval = 0, weights = 0.1, learning rate = 0.0001, momentum=0.9 Loaded 828/828 pages (0-828) of document /home/shree/tesstutorial/vedic/san.AA_NAGARI_SHREE_L1.exp0.lstmf Loaded 691/691 pages (0-691) of document /home/shree/tesstutorial/saneval/san.Aksharyogini2.exp0.lstmf Loaded 1023/1023 pages (0-1023) of document /home/shree/tesstutorial/vedic/san.Sanskrit_2003.exp0.lstmf Loaded 957/957 pages (0-957) of document /home/shree/tesstutorial/vedic/san.e-Nagari_OT.exp0.lstmf Loaded 1060/1060 pages (0-1060) of document /home/shree/tesstutorial/vedic/san.FreeSans.exp0.lstmf Loaded 691/691 pages (0-691) of document /home/shree/tesstutorial/saneval/san.Amiko.exp0.lstmf Loaded 1213/1213 pages (0-1213) of document /home/shree/tesstutorial/vedic/san.Siddhanta-cakravat.exp0.lstmf Loaded 1191/1191 pages (0-1191) of document /home/shree/tesstutorial/vedic/san.Sahadeva.exp0.lstmf Loaded 1291/1291 pages (0-1291) of document /home/shree/tesstutorial/vedic/san.Santipur_OT_Medium.exp0.lstmf Loaded 1115/1115 pages (0-1115) of document /home/shree/tesstutorial/vedic/san.Lohit_Devanagari.exp0.lstmf Loaded 1210/1210 pages (0-1210) of document /home/shree/tesstutorial/vedic/san.Nakula.exp0.lstmf Found AVX Found SSE Loaded 1188/1188 pages (0-1188) of document /home/shree/tesstutorial/vedic/san.Siddhanta-Calcutta.exp0.lstmf Loaded 1211/1211 pages (0-1211) of document /home/shree/tesstutorial/vedic/san.Siddhanta.exp0.lstmf Loaded 1214/1214 pages (0-1214) of document /home/shree/tesstutorial/vedic/san.Siddhanta-Nepali.exp0.lstmf Loaded 1157/1157 pages (0-1157) of document /home/shree/tesstutorial/vedic/san.Uttara.exp0.lstmf Image too large to learn!! Size = 2594x48 Image not trainable Image too large to learn!! Size = 2758x48 Image not trainable Image too large to learn!! Size = 2621x48 Image not trainable At iteration 100/100/103, Mean rms=0.95%, delta=57.759%, char train=100.161%, word train=100%, skip ratio=3%, New worst char error = 100.161 wrote checkpoint