karldergrosse / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
0 stars 0 forks source link

[ 1608107 ] Conditional jump or move depends on uninitialised value(s) #2

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Emmanuel Fleury - efleury(sf)

Hi all,

I ran valgrind on the bokeoa-64bit-branch branch (with a patch submitted by
me) and I found the following problem:

==12801== Conditional jump or move depends on uninitialised value(s)
==12801== at 0x4A4B5E: IntegerMatcher(INT_CLASS_STRUCT*, unsigned long*,
unsigned long*, unsigned short, short, INT_FEATURE_STRUCT*, int, unsigned
char, INT_RESULT_STRUCT*, int) (intmatcher.cpp:1038)
==12801== by 0x49D7F9: AdaptToChar(blobstruct*, LINE_STATS*, unsigned
char, float) (adaptmatch.cpp:1285)
==12801== by 0x49E3DE: AdaptToWord(wordstruct*, textrowstruct*, char
const*, char const*, char const*) (adaptmatch.cpp:725)
==12801== by 0x427AD6: tess_adapter(WERD*, DENORM*, char const*, char
const*, char const*) (tessbox.cpp:349)
==12801== by 0x40DC33: classify_word_pass1(WERD_RES*, ROW*, unsigned
char, CHAR_SAMPLES_LIST*, CHAR_SAMPLE_LIST*) (control.cpp:611)
==12801== by 0x40E06C: recog_all_words(PAGE_RES*, ETEXT_DESC volatile*)
(control.cpp:295)
==12801== by 0x4044D6: recognize_page(STRING&) (tessedit.cpp:159)
==12801== by 0x4034A3: main (tesseractmain.cpp:104)

This bug is also present in 32bits architecture and does not depends on the
architecture.

Comments

Date: 2007-02-01 00:48
Sender: efleury
Logged In: YES 
user_id=122014
Originator: YES

Great ! 

But, did you checkout ??? I cannot get my local CVS archive to get any
update... :-/

Date: 2007-01-31 15:34
Sender: theraysmithProject Admin
Logged In: YES 
user_id=1515161
Originator: NO

This is fixed in 1.03. It was causing the adaptive classifier to not get
used enough.

Date: 2006-12-15 13:30
Sender: filipg
Logged In: YES 
user_id=37894
Originator: NO

I think this is a FALSE-ALARM from valgrind (another one below):

Breakpoint 1, IntegerMatcher (ClassTemplate=0x925d8d8,
ProtoMask=0x91de6f0,
ConfigMask=0xbf925a78, BlobLength=47, NumFeatures=53, Features=0xbf925270,
min_misses=0,
NormalizationFactor=0 '\0', Result=0xbf926114, Debug=0) at
intmatcher.cpp:1043
1043        if (Features[Feature].CP_misses >= min_misses) {
(gdb) list
1042      for (Feature = 0, used_features = 0; Feature < NumFeatures;
Feature++) {
1043        if (Features[Feature].CP_misses >= min_misses) {
1044          IMUpdateTablesForFeature (ClassTemplate, ProtoMask,
ConfigMask,
1045            Feature, &(Features[Feature]),
1046            FeatureEvidence, SumOfFeatureEvidence,
1047            ProtoEvidence, Debug);
1048          used_features++;
1049        }
(gdb) print min_misses
$5 = 0
(gdb) print Feature
$6 = 0
(gdb) print Features[Feature]
$7 = {X = 97 'a', Y = 30 '\036', Theta = 192 '�', CP_misses = 0
'\0'}

Looks OK to me...  The same seems to be true for "Source and destination
overlap
in strcpy". Take a look:

Breakpoint 1, fix_quotes (string=0x86b5b31 "\"'License'');",
word=0x87dc530, 
    blob_choices=0xbfca8f58) at control.cpp:1034
1034          strcpy (ptr + 1, ptr + 2); //shuffle up
(gdb) list 1029
1029      for (ptr = string;
1030      *ptr != '\0'; ptr++, blob_it.forward (), choice_it.forward ())
{
1031        if ((*ptr == '\'' || *ptr == '`')
1032        && (*(ptr + 1) == '\'' || *(ptr + 1) == '`')) {
1033          *ptr = '"';                //turn to double
1034          strcpy (ptr + 1, ptr + 2); //shuffle up
(gdb) print ptr+1
$1 = 0x86b5b32 "'License'');"
(gdb) print ptr+2
$2 = 0x86b5b33 "License'');"

Looks fine to me. Valgrind pointed to above twice as both recognition
passes call
fix_quotes():

==1993== 3 errors in context 1 of 4:
==1993== Source and destination overlap in strcpy(0x469B2FA, 0x469B2FB)
==1993==    at 0x4006AAD: strcpy (mc_replace_strmem.c:106)
==1993==    by 0x805333E: fix_quotes(char*, WERD*,
BLOB_CHOICE_LIST_CLIST*) (control.cpp:1034)
==1993==    by 0x8054DBD: classify_word_pass1(WERD_RES*, ROW*, unsigned
char, CHAR_SAMPLES_LIST*, CHAR_SAMPLE_LIST*) (control.cpp:592)
==1993==    by 0x80554C2: recog_all_words(PAGE_RES*, ETEXT_DESC volatile*)
(control.cpp:317)
==1993==    by 0x804B9EB: recognize_page(STRING&) (tessedit.cpp:187)
==1993==    by 0x804A869: main (tesseractmain.cpp:454)
==1993== 
==1993== 6 errors in context 2 of 4:
==1993== Source and destination overlap in strcpy(0x46998D2, 0x46998D3)
==1993==    at 0x4006AAD: strcpy (mc_replace_strmem.c:106)
==1993==    by 0x805333E: fix_quotes(char*, WERD*,
BLOB_CHOICE_LIST_CLIST*) (control.cpp:1034)
==1993==    by 0x8053B4C: match_word_pass2(WERD_RES*, ROW*, float)
(control.cpp:913)

Guess tesseract needs its own valgrind_suppressions.sh...

I've been meaning to play with valgrind for an unrelated reason - looked
into
this report while installing it. Very nice program. Found my rare
corruption
issue in a house app with it and 6 other potential problems!

Check it out if you haven't: http://www.valgrind.org/ (painless Linux
install)

Cheers,
Fil

Original issue reported on code.google.com by tmb...@gmail.com on 7 Mar 2007 at 10:24

GoogleCodeExporter commented 9 years ago
This was fixed in 1.03

Original comment by theraysm...@gmail.com on 17 May 2007 at 6:14