Open DavidHribek opened 3 years ago
Our problem: we don't know your images, but need one of those which don't work.
For example from this image Tesseract never generates .lstmf, .box, .txt.
Thanks, confirmed.
There is an (unrelated) bug in src/ccmain/pagesegmain.cpp: read_unlv_file
is called with a buggy name
argument ("114025495-7830ed00-9875-11eb-8889-a9c9ea4003a7\000png.uzn").
I tried to run training with less images including image "Benesov" mentioned above. Lstmf file was not generator for the "Benesov" image, it was skipped and training started. if I run training with all my images, training never starts.
Tesseract does not find a text box in this image. It tries to find lines, finds none (why?) and also does not use --psm 7
as a fallback and accept the whole image as a line.
Call stack:
#0 tesseract::line_edges (x=0, y=87, xext=210, uppercolour=1 '\001', bwpos=0x611000006a80 "", prevline=0x61c000000080, free_cracks=0x7fffffff88c0, outline_it=0x7fffffff8ce0)
at ../../../src/textord/scanedg.cpp:185
#1 0x0000000001ca7615 in tesseract::block_edges (t_pix=..., block=0x60f000000138, outline_it=0x7fffffff8ce0) at ../../../src/textord/scanedg.cpp:99
#2 0x0000000001c5c805 in tesseract::extract_edges (pix=..., block=0x60f000000130) at ../../../src/textord/edgblob.cpp:330
#3 0x0000000001ebf79b in tesseract::Textord::find_components (this=0x7ffff183e5f0, pix=..., blocks=0x6020000064d0, to_blocks=0x7fffffff9c80)
at ../../../src/textord/tordmain.cpp:224
#4 0x0000000001e9b2ef in tesseract::Textord::TextordPage (this=0x7ffff183e5f0, pageseg_mode=tesseract::PSM_SINGLE_LINE, reskew=..., width=210, height=88, binary_pix=...,
thresholds_pix=..., grey_pix=..., use_box_bottoms=false, diacritic_blobs=0x7fffffff9c60, blocks=0x6020000064d0, to_blocks=0x7fffffff9c80)
at ../../../src/textord/textord.cpp:185
#5 0x0000000000b34cca in tesseract::Tesseract::SegmentPage (this=0x7ffff181a800, input_file=0x6060000033e0 "114025495-7830ed00-9875-11eb-8889-a9c9ea4003a7.png",
blocks=0x6020000064d0, osd_tess=0x0, osr=0x7fffffffa3a0) at ../../../src/ccmain/pagesegmain.cpp:172
#6 0x000000000058fc0d in tesseract::TessBaseAPI::FindLines (this=0x7fffffffdb40) at ../../../src/api/baseapi.cpp:2187
#7 0x0000000000591ea9 in tesseract::TessBaseAPI::Recognize (this=0x7fffffffdb40, monitor=0x0) at ../../../src/api/baseapi.cpp:837
#8 0x00000000005a12f9 in tesseract::TessBaseAPI::ProcessPage (this=0x7fffffffdb40, pix=0x606000003380, page_index=0,
filename=0x7fffffffe668 "114025495-7830ed00-9875-11eb-8889-a9c9ea4003a7.png", retry_config=0x0, timeout_millisec=0, renderer=0x0) at ../../../src/api/baseapi.cpp:1254
#9 0x00000000005a71fe in tesseract::TessBaseAPI::ProcessPagesInternal (this=0x7fffffffdb40, filename=0x7fffffffe668 "114025495-7830ed00-9875-11eb-8889-a9c9ea4003a7.png",
retry_config=0x0, timeout_millisec=0, renderer=0x0) at ../../../src/api/baseapi.cpp:1217
#10 0x00000000005a3102 in tesseract::TessBaseAPI::ProcessPages (this=0x7fffffffdb40, filename=0x7fffffffe668 "114025495-7830ed00-9875-11eb-8889-a9c9ea4003a7.png",
retry_config=0x0, timeout_millisec=0, renderer=0x0) at ../../../src/api/baseapi.cpp:1070
#11 0x00000000004e635a in main (argc=6, argv=0x7fffffffe398) at ../../../src/api/tesseractmain.cpp:782
@DavidHribek, you could try removing the left black border from the image. The image can also be cropped below the text. Maybe one of those modifications might fix the problem.
Tesseract in non training mode will also fail.
This is a know issue.
Try this command (ImageMagick):
convert img1.png -bordercolor White -border 10x10 img2.png
Hello,
I want to train tesseract on my own images (text lines). I copied all my images and texts into folder (image_name.[png/jpg], image_name.gt.txt).
Now i run this command:
make training MODEL_NAME=eng \ START_MODEL=eng \ MAX_ITERATIONS=100 \ PSM=7 \ TESSDATA=/path/to/tessdata \ GROUND_TRUTH_DIR=/path/to/folder/with/images/and/texts
It produces .lstmf, .box and .txt file for every image in the folder with message:
Then the training starts.
My problem: For some images second command does not produce .lstmf, .box, .txt files, but tesseract is still waiting for them, so training does not start.
Thanks for help.