tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
62.37k stars 9.52k forks source link

Process (text2image) crashed and dumped core #2737

Open ghost opened 5 years ago

ghost commented 5 years ago

Environment

Current Behavior:

Used Linux: Arch Linux uname -a => Linux 5.3.7-arch1-2-ARCH 1 SMP PREEMPT @1572002934 x86_64 GNU/Linux Used Library Versions Name : pango Version : 1:1.44.7-1 Description : A library for layout and rendering of text Name : cairo Version : 1.17.2+17+g52a7c79fd-2 Description : 2D graphics library with support for multiple output devices Name : leptonica Version : 1.78.0-1 Description : Software that is broadly useful for image processing and image analysis applications Name : libtiff Version : 4.0.10-1 Description : Library for manipulation of TIFF images

Limits ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63432 max locked memory (kbytes, -l) 32768 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 16777216 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 63432 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

Using tesseract/src/training/tesstrain.sh

free(): invalid next size (normal) src/training/tesstrain_utils.sh: line 72: 6436 Aborted (core dumped) "${cmd}" "$@" 2>&1 6437 Done | tee -a "${LOG_FILE}" ERROR: Program text2image failed. Abort.

text2image Command /usr/bin/text2image --fonts_dir=$FONTS_DIR --font=$FONT_NAME --outputbase=$TEXT/text.txt --text=$TEXT/text.txt --fontconfig_tmpdir=/tmp/font_tmp.XYZsss

Oct 30 21:32:25 nobel systemd-coredump[6507]: Process 6436 (text2image) of user 1000 dumped core.

                                          Stack trace of thread 6436:
                                          #0  0x00007fe37e40af25 raise (libc.so.6)
                                          #1  0x00007fe37e3f4897 abort (libc.so.6)
                                          #2  0x00007fe37e44e258 __libc_message (libc.so.6)
                                          #3  0x00007fe37e45577a malloc_printerr (libc.so.6)
                                          #4  0x00007fe37e45746c _int_free (libc.so.6)
                                          #5  0x00007fe37ed635e3 pango_glyph_string_free (libpango-1.0.so.0)
                                          #6  0x00007fe37ed5880c n/a (libpango-1.0.so.0)
                                          #7  0x00007fe37ebeb678 g_slist_foreach (libglib-2.0.so.0)
                                          #8  0x00007fe37ed587b6 pango_layout_line_unref (libpango-1.0.so.0)
                                          #9  0x00007fe37ed588cc n/a (libpango-1.0.so.0)
                                          #10 0x00007fe37ed59d36 n/a (libpango-1.0.so.0)
                                          #11 0x00007fe37ed06491 g_object_unref (libgobject-2.0.so.0)
                                          #12 0x0000556e3754c226 n/a (text2image)
                                          #13 0x0000556e3754ef6b n/a (text2image)
                                          #14 0x0000556e3753f199 n/a (text2image)
                                          #15 0x0000556e3753cdea n/a (text2image)
                                          #16 0x00007fe37e3f6153 __libc_start_main (libc.so.6)
                                          #17 0x0000556e3753de6e n/a (text2image)

Subject: Process 6436 (text2image) dumped core Defined-By: systemd Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel Documentation: man:core(5) Process 6436 (text2image) crashed and dumped core.

Expected Behavior:

Complete and exit normally but core dumped

Suggested Fix:

Internal Code Debug

zdenop commented 5 years ago

How did you run training script?

ghost commented 5 years ago

How did you run training script?

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-starter-traineddata mkdir -p ~/tesstutorial/engoutput training/lstmtraining --debug_interval 100 \ --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \ --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log

Note: same command runs fine on Fedora $uname -a Linux fedora 5.3.7-200.fc30.x86_64 1 SMP Fri Oct 18 20:13:59 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux $tesseract --version tesseract 4.1.0 leptonica-1.78.0 libgif 5.1.6 : libjpeg 6b (libjpeg-turbo 2.0.2) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.3 Found AVX2 Found AVX Found SSE

It seems to be incompatibility issue of library versions