rmtheis / tess-two

Fork of Tesseract Tools for Android
Apache License 2.0
3.76k stars 1.38k forks source link

getUTF8Text() crashes when using with a self created training file #15

Closed luklanis closed 12 years ago

luklanis commented 12 years ago

I've created a training file with only 12 characters (0-9, >, +) regarding the wiki page here http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3. It works on ubuntu but it crashes with a SIGSEGV on the phone. Any ideas?

rmtheis commented 12 years ago

Not really. Maybe try seeing if it's a v3.01 vs. v3.02 problem.

luklanis commented 12 years ago

Solved!

The problem was (not found in wiki but in issues of tesseract-ocr) not calling shapeclustering (https://code.google.com/p/tesseract-ocr/issues/detail?id=629#c8).

Never thought solving a problem that simple when the backtrace looks like this:

0x5c38457a in ERRCODE::error (this=0x5c4bf638, caller=<optimized out>, action=ABORT, format=0x5c475eb4 "in file %s, line %d")
    at jni/com_googlecode_tesseract_android/src/ccutil/errcode.cpp:86
86        if (!*p)
(gdb) bt
#0  0x5c38457a in ERRCODE::error (this=0x5c4bf638, caller=<optimized out>, action=ABORT, format=0x5c475eb4 "in file %s, line %d")
    at jni/com_googlecode_tesseract_android/src/ccutil/errcode.cpp:86
#1  0x5c33ea6c in get (index=8, this=0xfc2a7c) at jni/com_googlecode_tesseract_android/src/ccutil/genericvector.h:512
#2  UnicityTable<tesseract::FontInfo>::get (this=0xfc2a7c, id=8) at jni/com_googlecode_tesseract_android/src/ccutil/unicity_table.h:133
#3  0x5c4204b0 in tesseract::LanguageModel::FillConsistencyInfo (this=0xfd40b8, curr_col=1, word_end=<optimized out>, b=0x1035a10, 
    parent_vse=0x1035b90, parent_b=0x10359d0, chunks_record=0xbeabd188, consistency_info=0xbeabced8)
    at jni/com_googlecode_tesseract_android/src/wordrec/language_model.cpp:1124
#4  0x5c422926 in tesseract::LanguageModel::AddViterbiStateEntry (this=0xfd40b8, top_choice_flags=7 '\a', denom=1, 
    word_end=<optimized out>, curr_col=1, curr_row=1, b=0x1035a10, parent_b=0x10359d0, parent_vse=0x1035b90, pain_points=0x1027fe0, 
    best_path_by_column=0xbeabd100, chunks_record=0xbeabd188, best_choice_bundle=0xbeabd114, blamer_bundle=0x0)
    at jni/com_googlecode_tesseract_android/src/wordrec/language_model.cpp:506
#5  0x5c4232a4 in tesseract::LanguageModel::UpdateState (this=0xfd40b8, changed=<optimized out>, curr_col=1, curr_row=<optimized out>, 
    curr_list=0x1035a00, parent_list=0x10359c0, pain_points=0x1027fe0, best_path_by_column=0xbeabd100, chunks_record=0xbeabd188, 
    best_choice_bundle=0xbeabd114, blamer_bundle=0x0) at jni/com_googlecode_tesseract_android/src/wordrec/language_model.cpp:372
#6  0x5c4256f6 in tesseract::Wordrec::UpdateSegSearchNodes (this=0xfc2418, starting_col=1, pending=0xbeabd10c, 
    best_path_by_column=0xbeabd100, chunks_record=0xbeabd188, pain_points=0x1027fe0, best_choice_bundle=0xbeabd114, blamer_bundle=0x0)
    at jni/com_googlecode_tesseract_android/src/wordrec/segsearch.cpp:215
#7  0x5c425a1e in tesseract::Wordrec::SegSearch (this=0xfc2418, chunks_record=0xbeabd188, best_choice=0xf15f18, 
    best_char_choices=<optimized out>, raw_choice=0x1034708, output_best_state=0xbeabd28c, blamer_bundle=0x0)
    at jni/com_googlecode_tesseract_android/src/wordrec/segsearch.cpp:114
#8  0x5c419ec0 in tesseract::Wordrec::word_associator (this=0xfc2418, only_create_ratings_matrix=<optimized out>, word=0xf1e200, 
    state=0xbeabd28c, best_char_choices=0x1034828, fixpt=0xbeabd294, best_state=0xbeabd28c)
    at jni/com_googlecode_tesseract_android/src/wordrec/chopper.cpp:1030
#9  0x5c41ae96 in tesseract::Wordrec::chop_word_main (this=0xfc2418, word=0xf1e200)
    at jni/com_googlecode_tesseract_android/src/wordrec/chopper.cpp:646
...
rmtheis commented 12 years ago

Glad you got it fixed. I think the shapeclustering piece is new and hasn't made it into the instructions yet.