karldergrosse / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
0 stars 0 forks source link

[ 1552781 ] 64-bit building fails #12

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Graeme Humphries - unit3Accepting Donations(sf)

Building on Ubuntu/dapper AMD64 fails with the
following error:

make[3]: Entering directory
`/home/graemehu/Downloads/installers/packages/tesseract-1.0/aspirin'
source='bpsupport.cpp' object='bpsupport.o' libtool=no \
depfile='.deps/bpsupport.Po'
tmpdepfile='.deps/bpsupport.TPo' \
depmode=gcc3 /bin/sh ../config/depcomp \
g++ -DHAVE_CONFIG_H -I. -I. -I.. -I../ccutil
-I../cutil -DNDEBUG -O3 -Wall -c -o bpsupport.o `test
-f 'bpsupport.cpp' || echo './'`bpsupport.cpp
../ccutil/strngs.h: In member function ‘void
STRING::de_dump(FILE*)’:
../ccutil/strngs.h:171: error: cast from ‘char*’ to
‘int’ loses precision
make[3]: *** [bpsupport.o] Error 1

It looks like the code is pretty much not 64-bit safe,
patching that particular instance just leads to more
problems croping up with pointer casting problems
(64-bit pointers and 32-bit ints).

I'd patch this, but I'm not sure what the
rammifications are of just changing the int casts to
(long long) casts. I suspect other parts of the code
will just treat them as int32s and break things anyway,
even if it compiles.

Comments

Date: 2006-12-01 16:31
Sender: efleury
Logged In: YES 
user_id=122014
Originator: NO

Doh... 

Yes, you wrote "branch" and I read "patch"... Sorry for that.

I just compiled it and it went through the phototest.tif example with no
harm. I'll take a look at the way you solved this 'cos I'm interested in it
! :)

Thanks a lot.

Date: 2006-12-01 16:18
Sender: bokeoa
Logged In: YES 
user_id=1340826
Originator: NO

It's not really a patch as much as a branch in CVS:

http://tesseract-ocr.cvs.sourceforge.net/tesseract-ocr/tesseract/?pathrev=bokeoa
-64bit-branch

Ray actually provided the solution:

http://sourceforge.net/forum/forum.php?thread_id=1609671&forum_id=534361

If you want to check out my branch, try these commands:

cvs
-d:pserver:anonymous@tesseract-ocr.cvs.sourceforge.net:/cvsroot/tesseract-ocr
login
[enter for password]
cvs -z3
-d:pserver:anonymous@tesseract-ocr.cvs.sourceforge.net:/cvsroot/tesseract-ocr
co -P -r bokeoa-64bit-branch tesseract

Let me know how it goes.
Bryan

Date: 2006-12-01 16:10
Sender: efleury
Logged In: YES 
user_id=122014
Originator: NO

I didn't find where your patch lies. :-/

Anyway, I did solve the bug. In fact, it is due to the fact that (void **)
hasn't the same size in 32bits and 64bits architectures.

So, the segfault occurs in a strcpy call where the variables demodir is
pointing to an address out of bound. Tracking the value of depmod with a
watchpoint in gdb made me understand that a write on acts_ocr (declared as
an int in cutil/globals.cpp) which occurs in cutil/variables.cpp:83 was
performed through the following line:

  *((void **) this_var->address) = default_value.ptr_part;

Of course, on 32bits architectures the size of (void **) and int are the
same... but not on 64bits architectures... So when casted to a (void **)
size, the write overwrite not only acts_ocr but also the content of
demodir.

The only (ugly) fix I found to fix this was to introduce a "void *padding"
variable to absorb the modifications when it's going out of bound.

Second, the fix I proposed for the first bug is outrageous... :-/
The real problem was coming from the fact that INT32 is typedef'ed as a
long which is insane if you want to port some code on 64bits plate-forms.
So the best solution would be to replace all the INT32 by int32_t types and
solve the compilation problems...

Date: 2006-12-01 15:32
Sender: bokeoa
Logged In: YES 
user_id=1340826
Originator: NO

Graeme,

Try the code from the bokeoa-64bit-branch in CVS and see if it
works for you.  I've tested it on both amd64 and ia64 myself.

Bryan

Date: 2006-11-27 01:05
Sender: efleury
Logged In: YES 
user_id=122014
Originator: NO

I (finally) understood it ! It's because the difference of size of void*
in 32bits and 64bits architectures. I'll try to have a patch tonight
(CET).
Gosh, this bug was quite disturbing. ô_ô

And I counted 388 occurrences of void * in the code. Might be long to
check them all. :-/

Date: 2006-11-24 16:54
Sender: efleury
Logged In: YES 
user_id=122014
Originator: NO

Current segfault is:

(gdb) run
Starting program:
/home/fleury/development/projects/tesseract/tesseract-1.02-ef/ccmain/tesseract
/home/fleury/development/projects/tesseract/tesseract-1.02-ef/ccmain/tesseract:E
rror:Usage:/home/fleury/development/projects/tesseract/tesseract-1.02-ef/ccmain/
tesseract
imagename outputbase [configfile [[+|-]varfile]...]

Signal_exit 25 ABORT. LocCode: 3  AbortCode: 0

Program exited with code 031.
(gdb) set args ../phototest.tif test.txt
(gdb) run
Starting program:
/home/fleury/development/projects/tesseract/tesseract-1.02-ef/ccmain/tesseract
../phototest.tif test.txt

Program received signal SIGSEGV, Segmentation fault.
0x00002b3273ef5020 in strcpy () from /lib/libc.so.6
(gdb) bt
#0  0x00002b3273ef5020 in strcpy () from /lib/libc.so.6
#1  0x000000000049f3e6 in InitAdaptiveClassifier () at adaptmatch.cpp:814
#2  0x00000000004981e3 in mfeature_init () at mfvars.cpp:50
#3  0x0000000000492728 in program_editup (configfile=0x0) at tface.cpp:92
#4  0x0000000000492749 in start_recog (configfile=0x0,
textbase=0x7fff371cc968 "test.txt") at tface.cpp:67
#5  0x00000000004042ab in init_tesseract (arg0=0x7fff371cc908
"/home/fleury/development/projects/tesseract/tesseract-1.02-ef/ccmain/tesseract"
,
    textbase=0x7fff371cc968 "test.txt", configfile=0x0, configc=0,
configv=0x7fff371cb8a8) at tessedit.cpp:125
#6  0x0000000000403125 in main (argc=3, argv=0x7fff371cb898) at
tesseractmain.cpp:70
(gdb) up
#1  0x000000000049f3e6 in InitAdaptiveClassifier () at adaptmatch.cpp:814
814       strcpy(Filename, demodir);
(gdb) list
809       char Filename[1024];
810
811       if (!EnableAdaptiveMatcher)
812         return;
813
814       strcpy(Filename, demodir);
815       strcat(Filename, BuiltInTemplatesFile);
816       #ifndef SECURE_NAMES
817       //      cprintf( "\nReading built-in templates from %s ...",
818       //              Filename);
(gdb)

Apparently "demodir" is set at an out of bound address. Don't know why
yet.

Date: 2006-11-24 16:50
Sender: efleury
Logged In: YES 
user_id=122014
Originator: NO

I'm working on it... Already fixed compilation and I'm attacking
segfaults. I'll try to keep you informed (for now nothing impossible, but
keep fingers crossed !).

Compilation can be found here: 
http://sourceforge.net/tracker/index.php?func=detail&aid=1602051&group_id=158586
&atid=808426

And here is one solution for the first segfault:

diff -ruN tesseract-1.02/dict/dawg.cpp tesseract-1.02-ef/dict/dawg.cpp
--- tesseract-1.02/dict/dawg.cpp        2006-06-17 00:17:07.000000000
+0200
+++ tesseract-1.02-ef/dict/dawg.cpp     2006-11-25 01:48:23.000000000
+0100
@@ -270,7 +270,7 @@
 void read_squished_dawg(char *filename, EDGE_ARRAY dawg, INT32
max_num_edges) {
   FILE       *file;
   EDGE_REF   edge;
-  INT32      num_edges;
+  INT32      num_edges = 0;
   INT32      node_count = 0;

   if (debug) print_string ("read_debug");

Date: 2006-09-08 04:07
Sender: glenstewart
Logged In: YES 
user_id=81772

Here's the 1.01 compile result on Ubuntu Dapper 6.06 AMD64...

<pre>
~/tesseract-1.01# ./configure
./configure: line 1329: tesseract: command not found
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for cl.exe... no
checking for g++... g++
checking for C++ compiler default output... a.out
checking whether the C++ compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking dependency style of gcc... gcc3
checking whether gcc and cc understand -c and -o together... yes
checking whether to enable maintainer-specific portions of
Makefiles... no
checking whether byte ordering is bigendian... no
checking for ranlib... ranlib
checking for GnuWin32 directory... not found
checking if g++ accepts -O3... yes
checking if g++ accepts -Wall... yes
checking whether the compiler recognizes bool as a built-in
type... yes
checking whether the compiler recognizes typename... yes
checking whether the compiler comes with standard
includes... yes
checking how to run the C++ preprocessor... g++ -E
checking for egrep... grep -E
checking for ANSI C header files... yes
checking whether time.h and sys/time.h may both be
included... yes
checking for sys/wait.h that is POSIX.1 compatible... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking sys/ipc.h usability... yes
checking sys/ipc.h presence... yes
checking for sys/ipc.h... yes
checking sys/shm.h usability... yes
checking sys/shm.h presence... yes
checking for sys/shm.h... yes
checking limits.h usability... yes
checking limits.h presence... yes
checking for limits.h... yes
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking for stdbool.h that conforms to C99... yes
checking for _Bool... no
checking whether #! works in shell scripts... yes
checking for special C compiler options needed for large
files... no
checking for _FILE_OFFSET_BITS value needed for large
files... no
checking for _LARGE_FILES value needed for large files... no
checking for wchar_t... yes
checking for long long int... yes
checking for mbstate_t... yes
checking for size_t... yes
checking for stdlib.h... (cached) yes
checking for unistd.h... (cached) yes
checking for getpagesize... yes
checking for working mmap... no
checking for pid_t... yes
checking for unistd.h... (cached) yes
checking vfork.h usability... no
checking vfork.h presence... no
checking for vfork.h... no
checking for fork... yes
checking for vfork... yes
checking for working fork... no
checking for working vfork... (cached) yes
checking for strerror... yes
checking for vsnprintf... yes
checking for gethostname... yes
checking for strchr... yes
checking for memcpy... yes
checking for acos... yes
checking for asin... yes
checking for Leffler libtiff library... checking linking
with -ltiff... ok
setting LIBTIFF_CFLAGS=
setting LIBTIFF_LIBS=-ltiff
configure: creating ./config.status
config.status: creating Makefile
config.status: creating aspirin/Makefile
config.status: creating ccmain/Makefile
config.status: creating ccstruct/Makefile
config.status: creating ccutil/Makefile
config.status: creating classify/Makefile
config.status: creating cutil/Makefile
config.status: creating dict/Makefile
config.status: creating display/Makefile
config.status: creating image/Makefile
config.status: creating textord/Makefile
config.status: creating viewer/Makefile
config.status: creating wordrec/Makefile
config.status: creating config_auto.h
config.status: executing depfiles commands

Configuration is done.
You can now build Tesseract by running:

% make

Note: 'make install' has not been implemented yet. Avoid using.
root@server:~/tesseract-1.01# make
make  all-recursive
make[1]: Entering directory `/root/tesseract-1.01'
Making all in aspirin
make[2]: Entering directory `/root/tesseract-1.01/aspirin'
make[3]: Entering directory `/root/tesseract-1.01/aspirin'
source='bpsim.cpp' object='bpsim.o' libtool=no \
        depfile='.deps/bpsim.Po' tmpdepfile='.deps/bpsim.TPo' \
        depmode=gcc3 /bin/sh ../config/depcomp \
        g++ -DHAVE_CONFIG_H -I. -I. -I..  -I../ccutil
-I../cutil   -DNDEBUG -O3 -Wall -c -o bpsim.o `test -f
'bpsim.cpp' || echo './'`bpsim.cpp
source='bpsupport.cpp' object='bpsupport.o' libtool=no \
        depfile='.deps/bpsupport.Po'
tmpdepfile='.deps/bpsupport.TPo' \
        depmode=gcc3 /bin/sh ../config/depcomp \
        g++ -DHAVE_CONFIG_H -I. -I. -I..  -I../ccutil
-I../cutil   -DNDEBUG -O3 -Wall -c -o bpsupport.o `test -f
'bpsupport.cpp' || echo './'`bpsupport.cpp
../ccutil/strngs.h: In member function 'void
STRING::de_dump(FILE*)':
../ccutil/strngs.h:171: error: cast from 'char*' to 'int'
loses precision
make[3]: *** [bpsupport.o] Error 1
make[3]: Leaving directory `/root/tesseract-1.01/aspirin'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/root/tesseract-1.01/aspirin'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/tesseract-1.01'
make: *** [all] Error 2
</pre>

Date: 2006-09-07 15:34
Sender: rdbrown0au
Logged In: YES 
user_id=1592436

Specializing versions of add_variable for the base types
works. The program then fails in
intproto.cpp:ReadIntTemplates  which seems to be reading
binary data from the file inttemp - is pickling/unpickling
the jargon term. Anyway, the structures being read from the
file include pointer objects, so the reading such files
created on a 32-bit build into a 64-bit build cause
cascading failures.

Date: 2006-09-06 07:42
Sender: nobody
Logged In: NO 

Patch submitted to link on x86_64 linux, but crashes when run
because add_variable is assuming it can initialize any type
used by assigning a pointer. 

I think specializing versions of add_variable for the various
base types could have this work.

Date: 2006-09-05 15:32
Sender: aramm
Logged In: YES 
user_id=1105490

The code looks pretty unstable for that part.

However you can still get it to compile 
using "CXXFLAGS=-m32 ./configure && make"

Date: 2006-09-05 14:57
Sender: theraysmithProject Admin
Logged In: YES 
user_id=1515161

64 bit compatibility is unlikely to be be fixed any time soon.

Original issue reported on code.google.com by tmb...@gmail.com on 7 Mar 2007 at 10:31

GoogleCodeExporter commented 9 years ago

Original comment by tmb...@gmail.com on 7 Mar 2007 at 10:37