tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.14k stars 9.39k forks source link

`tesseract -l deu --psm 0` dies with SIGSEGV #1855

Closed AlexanderP closed 6 years ago

AlexanderP commented 6 years ago

Environment

Current Behavior:

Hi.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906922

Package: tesseract-ocr Version: 4.00~git2844-607e8fd8-2 Severity: normal

Hi,

I've got a coredump on running tesseract. If it helps, I can provide the image, but I could reproduce this behaviour with different images.

Here are the details of the crash:

           PID: 23207 (tesseract)
           UID: 1000 (joerg)
           GID: 1000 (joerg)
        Signal: 11 (SEGV)
     Timestamp: Wed 2018-08-22 12:08:40 CEST (36min ago)
  Command Line: tesseract -l deu --psm 0 Sync/Handy-Bilder/OpenCamera/IMG_20180822_114332.jpg stdout
    Executable: /usr/bin/tesseract
 Control Group: /user.slice/user-1000.slice/session-1.scope
          Unit: session-1.scope
         Slice: user-1000.slice
       Session: 1
     Owner UID: 1000 (joerg)
       Storage: /var/lib/systemd/coredump/core.tesseract.1000.e836f78a7b7f4173bfe906ade0a85a6e.23207.1534932520000000.lz4
       Message: Process 23207 (tesseract) of user 1000 dumped core.

[New LWP 23207]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `tesseract -l deu --psm 0 Sync/Handy-Bilder/OpenCamera/IMG_20180822_114332.jpg s'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f3b349b40f3 in tesseract::Classify::CharNormClassifier(TBLOB*, tesseract::TrainingSample const&, ADAPT_RESULTS*) () at ../../src/ccstruct/normalis.h:247
247     ../../src/ccstruct/normalis.h: Datei oder Verzeichnis nicht gefunden.
#0  0x00007f3b349b40f3 in tesseract::Classify::CharNormClassifier(TBLOB*, tesseract::TrainingSample const&, ADAPT_RESULTS*) () at ../../src/ccstruct/normalis.h:247
        unichar_results = {static kDefaultVectorSize = 4, size_used_ = 0, size_reserved_ = 4, data_ = 0x556b9cfa8d88, clear_cb_ = 0x0, compare_cb_ = 0x0}
#1  0x00007f3b349b4e16 in tesseract::Classify::DoAdaptiveMatch(TBLOB*, ADAPT_RESULTS*) () at adaptmatch.cpp:1546
        Ambiguities = <optimized out>
        fx_info = {Length = 1044, Xmean = 1, Ymean = 126, Rx = 44, Ry = 80, NumBL = 85, NumCN = 70, Width = 256, YBottom = 64 '@', YTop = 192 '\300'}
        bl_features = {static kDefaultVectorSize = 4, size_used_ = 85, size_reserved_ = 128, data_ = 0x556b9afcd0e0, clear_cb_ = 0x0, compare_cb_ = 0x0}
        sample = 0x556b9941c580
#2  0x00007f3b349b66b3 in tesseract::Classify::AdaptiveClassifier(TBLOB*, BLOB_CHOICE_LIST*) () at adaptmatch.cpp:199
        Results = 0x556b9aff2550
#3  0x00007f3b348fe71a in os_detect_blob(BLOBNBOX*, OrientationDetector*, ScriptDetector*, OSResults*, tesseract::Tesseract*) () at osdetect.cpp:358
        scaling = 7.52941179
        x_origin = 2515
        y_origin = 846
        rotated_blob = std::unique_ptr<TBLOB> = {get() = 0x556b99567580}
        i = <optimized out>
        blob = <optimized out>
        tblob = 0x556b9aff3540
        box = {bot_left = {xcoord = 2498, ycoord = <optimized out>}, top_right = {xcoord = <optimized out>, ycoord = 863}}
        current_rotation = {xcoord = 1, ycoord = 0}
        rotation90 = {xcoord = 0, ycoord = 1}
        ratings = {{<ELIST> = {last = 0x0}, <No data fields>}, {<ELIST> = {last = 0x0}, <No data fields>}, {<ELIST> = {last = 0x0}, <No data fields>}, {<ELIST> = {last = 0x0}, <No data fields>}}
        stop = <optimized out>
        orientation = <optimized out>
#4  0x00007f3b348fec71 in os_detect_blobs(GenericVector<int> const*, BLOBNBOX_CLIST*, OSResults*, tesseract::Tesseract*) () at ../../src/ccutil/qrsequence.h:52
        i = 0
        osr_ = {orientations = {0, 0, 0, 0}, scripts_na = {{0 <repeats 120 times>}, {0 <repeats 120 times>}, {0 <repeats 120 times>}, {0 <repeats 120 times>}}, unicharset = 0x0, best_result = {orientation_id = 0, script_id = 0, sconfidence = 0, oconfidence = 0}}
        o = {osr_ = 0x7ffd6f59dc50, allowed_scripts_ = 0x0}
        s = {osr_ = 0x7ffd6f59dc50, static korean_script_ = 0x7f3b34a7cc1d "Korean", static japanese_script_ = 0x7f3b34a7cc14 "Japanese", static fraktur_script_ = 0x7f3b34a8208c "Fraktur", korean_id_ = 8, japanese_id_ = 7, katakana_id_ = 3, hiragana_id_ = 4, han_id_ = 5, hangul_id_ = 6, latin_id_ = 2, fraktur_id_ = 9, tess_ = 0x7f3b32dc3010, allowed_scripts_ = 0x0}
        filtered_it = {<CLIST_ITERATOR> = {list = 0x7ffd6f59da88, prev = 0x556b9d0001e0, current = 0x556b9cfcf9d0, next = 0x556b9afd2b10, ex_current_was_last = false, ex_current_was_cycle_pt = false, cycle_pt = 0x556b9cfcf9d0, started_cycling = true}, <No data fields>}
        real_max = <optimized out>
        blobs = 0x556b9d0002e0
        number_of_blobs = <optimized out>
        sequence = {N_ = <optimized out>, next_num_ = 1, num_bits_ = <optimized out>}
        num_blobs_evaluated = 0
        orientation = <optimized out>
#5  0x00007f3b348feebf in os_detect(TO_BLOCK_LIST*, OSResults*, tesseract::Tesseract*) () at osdetect.cpp:268
        blobs_total = <optimized out>
        block_it = {<ELIST_ITERATOR> = {list = 0x7ffd6f59dba8, prev = 0x556b9cfe2d10, current = 0x556b9cfe2d10, next = 0x556b9cfe2d10, ex_current_was_last = false, ex_current_was_cycle_pt = false, cycle_pt = 0x556b9cfe2d10, started_cycling = true}, <No data fields>}
        filtered_list = {<CLIST> = {last = 0x556b9d0001e0}, <No data fields>}
        filtered_it = {<CLIST_ITERATOR> = {list = 0x7ffd6f59da88, prev = 0x556b9d0001e0, current = 0x0, next = 0x556b9cfcf9d0, ex_current_was_last = <optimized out>, ex_current_was_cycle_pt = <optimized out>, cycle_pt = <optimized out>, started_cycling = <optimized out>}, <No data fields>}
#6  0x00007f3b348ff3dd in orientation_and_script_detection(STRING&, OSResults*, tesseract::Tesseract*) () at osdetect.cpp:229
        name = {data_ = 0x556b9cfd0340}
        lastdot = <optimized out>
        page_box = <optimized out>
        width = 3264
        height = 2448
        blocks = {<ELIST> = {last = 0x556b9d008190}, <No data fields>}
        land_blocks = {<ELIST> = {last = 0x0}, <No data fields>}
        port_blocks = {<ELIST> = {last = 0x556b9cfe2d10}, <No data fields>}
#7  0x00007f3b348a5606 in tesseract::TessBaseAPI::DetectOS(OSResults*) () at baseapi.cpp:2435
No locals.
#8  0x00007f3b348a5735 in tesseract::TessBaseAPI::DetectOrientationScript (this=<optimized out>, orient_deg=orient_deg@entry=0x7ffd6f59e440, orient_conf=orient_conf@entry=0x7ffd6f59e444, script_name=script_name@entry=0x7ffd6f59e450, script_conf=script_conf@entry=0x7ffd6f59e448) at baseapi.cpp:1928
        osr = {orientations = {0, 0, 0, 0}, scripts_na = {{0 <repeats 120 times>}, {0 <repeats 120 times>}, {0 <repeats 120 times>}, {0 <repeats 120 times>}}, unicharset = 0x7f3b32dc3038, best_result = {orientation_id = 0, script_id = 0, sconfidence = 0, oconfidence = 0}}
        osd = <optimized out>
        orient_id = <optimized out>
        script_id = <optimized out>
#9  0x00007f3b348a5833 in tesseract::TessBaseAPI::GetOsdText (this=<optimized out>, page_number=0) at baseapi.cpp:1960
        orient_deg = 0
        orient_conf = 0
        script_name = 0x0
        script_conf = 0
        rotate = <optimized out>
        kOsdBufsize = <optimized out>
        osd_buf = <optimized out>
#10 0x00007f3b348ac434 in tesseract::TessOsdRenderer::AddImageHandler (this=0x556b99594d70, api=<optimized out>) at renderer.h:91
        osd = <optimized out>
#11 0x00007f3b348ac2bd in tesseract::TessResultRenderer::AddImage (this=this@entry=0x556b99594d70, api=api@entry=0x556b986820a0 <main::api>) at renderer.cpp:86
        ok = <optimized out>
#12 0x00007f3b348a6c44 in tesseract::TessBaseAPI::ProcessPage (this=this@entry=0x556b986820a0 <main::api>, pix=0x556b9941c3f0, page_index=page_index@entry=0, filename=filename@entry=0x7ffd6f5a070b "Sync/Handy-Bilder/OpenCamera/IMG_20180822_114332.jpg", retry_config=retry_config@entry=0x0, timeout_millisec=timeout_millisec@entry=0, renderer=0x556b99594d70) at baseapi.cpp:1245
        failed = <optimized out>
#13 0x00007f3b348a9b59 in tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) () at baseapi.cpp:1169
        stdInput = <optimized out>
        buf = ""
        data = <optimized out>
        format = 2
        r = <optimized out>
        tiff = <optimized out>
        pix = 0x556b9941c3f0
#14 0x00007f3b348a9c8e in tesseract::TessBaseAPI::ProcessPages (this=this@entry=0x556b986820a0 <main::api>, filename=filename@entry=0x7ffd6f5a070b "Sync/Handy-Bilder/OpenCamera/IMG_20180822_114332.jpg", retry_config=retry_config@entry=0x0, timeout_millisec=timeout_millisec@entry=0, renderer=<optimized out>) at baseapi.cpp:1070
        result = <optimized out>
#15 0x0000556b9867e072 in main () at tesseractmain.cpp:592
        succeed = <optimized out>
        lang = <optimized out>
        image = <optimized out>
        outputbase = 0x7ffd6f5a0740 "stdout"
        datapath = <optimized out>
        list_langs = false
        print_parameters = false
        arg_i = <optimized out>
        pagesegmode = tesseract::PSM_OSD_ONLY
        enginemode = <optimized out>
        vars_vec = {static kDefaultVectorSize = 4, size_used_ = 0, size_reserved_ = 4, data_ = 0x556b991fcef8, clear_cb_ = 0x0, compare_cb_ = 0x0}
        vars_values = {static kDefaultVectorSize = 4, size_used_ = 0, size_reserved_ = 4, data_ = 0x556b991fcf28, clear_cb_ = 0x0, compare_cb_ = 0x0}
        api = {_vptr.TessBaseAPI = 0x7f3b34b27db8 <vtable for tesseract::TessBaseAPI+16>, tesseract_ = 0x7f3b32dc3010, osd_tesseract_ = 0x556b9af96d60, equ_detect_ = 0x0, reader_ = 0x0, thresholder_ = 0x556b9941c440, paragraph_models_ = 0x0, block_list_ = 0x556b99422d70, page_res_ = 0x0, input_file_ = 0x556b99422d50, output_file_ = 0x556b99400130, datapath_ = 0x556b99426320, language_ = 0x556b99422d10, last_oem_requested_ = tesseract::OEM_DEFAULT, recognition_done_ = false, truth_cb_ = 0x0, rect_left_ = 0, rect_top_ = 0, rect_width_ = 3264, rect_height_ = 2448, image_width_ = 3264, image_height_ = 2448, unknown_title_ = 0x7f3b34a919d7 ""}
        init_failed = 0
        b = false
        in_training_mode = <optimized out>
        renderers = {<GenericVector<tesseract::TessResultRenderer*>> = {static kDefaultVectorSize = 4, size_used_ = 1, size_reserved_ = 4, data_ = 0x556b99415380, clear_cb_ = 0x0, compare_cb_ = 0x0}, <No data fields>}
        banner = <optimized out>
#16 0x00007f3b33e4db17 in __libc_start_main (main=0x556b9867d450 <main>, argc=7, argv=0x7ffd6f59ebd8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd6f59ebc8) at ../csu/libc-start.c:310
        self = <optimized out>
        __self = <optimized out>
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -3099063579296305054, 93920606807600, 140726471617488, 0, 0, -9092762819678343070, -9197268400409186206}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x7ffd6f59ec18, 0x7f3b34e26170}, data = {prev = 0x0, cleanup = 0x0, canceltype = 1868164120}}}
        not_first_call = <optimized out>
#17 0x0000556b9867e65a in _start () at tesseractmain.cpp:602
No symbol table info available.

Bye Jörg

-- System Information:
Debian Release: buster/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'unstable'), (1, 'experimental-debug'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.18.0-rc3-amd64 (SMP w/8 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages tesseract-ocr depends on:
ii  libc6                2.27-5
ii  libcairo2            1.15.10-3
ii  libfontconfig1       2.13.0-5
ii  libgcc1              1:8.2.0-4
ii  libglib2.0-0         2.56.1-2
ii  libgomp1             8.2.0-4
ii  libicu60             60.2-6
ii  liblept5             1.76.0-1
ii  libpango-1.0-0       1.42.4-1
ii  libpangocairo-1.0-0  1.42.4-1
ii  libpangoft2-1.0-0    1.42.4-1
ii  libstdc++6           8.2.0-4
ii  libtesseract4        4.00~git2844-607e8fd8-2
ii  tesseract-ocr-eng    1:4.00~git30-7274cfa-1
ii  tesseract-ocr-osd    1:4.00~git30-7274cfa-1

tesseract-ocr recommends no packages.

tesseract-ocr suggests no packages.
Shreeshrii commented 6 years ago

Please test with latest code. This issue should be fixed by PR https://github.com/tesseract-ocr/tesseract/pull/1818

AlexanderP commented 6 years ago

@Shreeshrii Absolutely right, last code does not have this error. i uploaded in mentors.debian.net

amitdo commented 6 years ago

For 4.0 betas released before that patch this command should solve the issue:

tesseract -l osd --psm 0