Closed GoogleCodeExporter closed 9 years ago
1. 2.04 version is very old. In svn there is 3.02 alpha version, so there will
no improvement in 2.0x version.
2. If you have a look at error message it is clear that VC++ 2008 is not
linking tesseract to libtiff library. There could be several reasons for this,
but I guess that problem is you are using "GnuWin32" libtiff (probably created
with mingw). Try to google for "Linking to libraries from different compilers".
Original comment by zde...@gmail.com
on 27 Mar 2012 at 2:59
thank you verymuch zde..( http://code.google.com/u/117377429268285189819/);
the real issue is that i am not a c++ programmer or visual studio user. i
followed instructions given elsewhere in this group; unfortunately, the
instruction simply mentioned that the libtiff.lib is added in the additional
dependency. i, ignorantly, did this in the custom build.
the real solution is given by Antonio Rubby (vide:
http://social.msdn.microsoft.com/Forums/en/Vsexpressvc/thread/cee0448f-5435-4fc1
-85f0-ae18fb71944d)
i am presenting below the revised steps:
============
This write-up (regarding recompiling/building tesseract 2.0.4 with libtiff
support for windows) is from (jwaddell)
http://superuser.com/questions/149568/command-line-ocr-in-windows-7
and slightly annotated for better clarity
Step-1a
Download tesseract 2.04. (there are two downloads available: one is the windows
executable and the second is the source. The windows executable is in
tesseract-2.04.exe.tar.gz -from
http://tesseract-ocr.googlecode.com/files/tesseract-2.04.exe.tar.gz. This
contains stand-alone, --no installation required-- windows executables; but
doesnot have the language files and the libtiff library links; for users having
single page uncompressed tifs this will work very well; but remember you need
language files; english language files are present in
http://tesseract-ocr.googlecode.com/files/tesseract-2.00.eng.tar.gz
Unpack tesseract-2.04.exe.tar.gz to say C:\tesseract204; (it will have two
folders "java", "training" and four files tessdll.dll, tessdll.lib,
tesseract.exe and dlltest.exe; now create a subdirectory "tessdata". and
unpack the eight files from tesseract-2.00.eng.tar.gz
C:\tesseract204\java\
C:\tesseract204\training\
C:\tesseract204\tessdll.dll
C:\tesseract204\tessdll.lib
C:\tesseract204\tesseract.exe
C:\tesseract204\dlltest.exe
C:\tesseract204\tessdata/eng.freq-dawg
C:\tesseract204\tessdata/eng.word-dawg
C:\tesseract204\tessdata/eng.user-words
C:\tesseract204\tessdata/eng.inttemp
C:\tesseract204\tessdata/eng.normproto
C:\tesseract204\tessdata/eng.pffmtable
C:\tesseract204\tessdata/eng.unicharset
C:\tesseract204\tessdata/eng.DangAmbigs
step 1b: for window users intending to do training, we need additional files in
two folders "configs" and "tessconfigs" ; these two are available in the source
http://tesseract-ocr.googlecode.com/files/tesseract-2.04.tar.gz
using a suitable unpakcer (7-zip or peazip) unpack the two folders from
tessdata inside the tesseract-2.04 folder in the source tar ball.
step-2 for more versatile users who want to use the libtiff we need to
recompile and build the executables. one needs the source
(tesseract-2.04.tar.gz), libtiff (see
http://gnuwin32.sourceforge.net/packages/tiff-win32.htm for details and get
full download using http://gnuwin32.sourceforge.net/downlinks/tiff.php) and VC
express 2008 (web install from
http://msdn.microsoft.com/en-us/express/future/bb421473 remember it is
webinstall and will take a good internet time to download the total ~92mb;
offline image iso from microsoft is much bigger
as it contains the complete visual studio; you can get individual offline
images vide the blog:
http://vicker313.wordpress.com/2008/11/26/how-to-offline-install-visual-studio-e
xpress-without-download-the-whole-image-file/
1) Install libtiff. On 64 bit win-7 system the suggested install directory is
C:\Program Files (x86)\GnuWin32; for 32 bit win-7 or xp it is c:\program
files\gnuwin32. Underneath this directory are a bunch of subdirectories
containing files we'll need to compile tesseract with tiff support, namely
include, bin and
lib. Add C:\Program Files (x86)\GnuWin32\bin to your PATH environment variable
so that the output tesseract.exe can find the libtiff dll. )this is done from
control panel /system /advanced and selecting environment variables)
2) if you have not used webinstall for VC++2008 do the offline install of VC
3) Unpack the source (tesseract-2.04.tar.gz). In this example I've unpacked to
C:\projects\tesseract-2.04. (Windows 7 /win xp will not understand .tar.gz out
of the box. My recommendation is to get a copy of 7-Zip.)
4.Download your required language files. (note: since we are now talking of
tesseract 2.04, donot use language packs meant for version 3.00 and up) Unpack
these to the tessdata subdirectory of C:\projects\tesseract-2.04\tessdata.
5. restart the machine. (so that the environment variable and path are
understood)
6.Open the vc solution (tesseract.sln) (Double click the
C:\projects\tesseract-2.04\tesseract.sln)
7.Now the Visual studio opens the VC GUI with solution explorer in the left
panel (if not press CTRL+ALT+L or select "solution explorer" from "view" menu.
The solution explorer shall show Soluton 'tesseract' (7 projects).
In the Icon/menu strip you will see a drop-downlist with Debug as the default option. this is " solution configuration " Change the solution configuration to "Release" mode from the drop-down list. Note that if you later change back to Debug mode, you'll need to set up all the following again...
8.In the solution explorer right click the solution node (Solution 'tesseract')
and click "Properties". This will opne a pop-up window title "solution
tesseract property page" and the left panel in the pop-up window will have
"common properties" and 'Configuration properties'. Change to "Configuration
Properties" and select / confirm "Release" configuration from the dropdown at
the top of the window. Press ok to close the property window.
9) Navigate to: Tools -> Options. This will open a pop-up window titled
"Options"; select from the left panel -> Projects and Solutions -> VC++
Directories Here we'll be adding the full paths for the subdirectories lib and
include from the libtiff install so that VC can find the required header (.h)
and static library (.lib) files. In this example they are:
$(ProgramFiles?)\GnuWin32?\include $(ProgramFiles)\GnuWin32\lib as I'm using an
environment variable. I could however just have written them as C:\Program
Files (x86)\GnuWin32?\include.
Change the "Show Directories For" dropdown to "Include files". Add the
following: $(ProgramFiles)\GnuWin32\include Now change the "Show Directories
For" dropdown to "Library files". Add the following:
$(ProgramFiles)\GnuWin32\lib
10. .Now open the project properties window for the tesseract project (Note:
seven projects are listed in the solution explorer; cntraining, dlltest,
mftraining, tessdll, tesseract, unicharset_Extractor, wordlist2dawg; select the
tesseract project by using the mouse and rightclick to open the properties
page; this will open a pop-up "tesseract Propety Page) Navigate the horrendous
list of options to Configuration Properties -> C/C++ -> Preprocessor . In the
right panel you will see Preprocessor Definitions and a list ; click on that
which will open an editable list; add HAVE_LIBTIFF to the list of Preprocessor
Definitions. This causes a bunch of #includes to be enabled in the code.
11. You also need to add an "Additional dependency". go to the "Additional
dependencies" section for the project properties (in the tesseract project)
Select the property page and the opened dialog, select “Configuration
Properties > Linker > Input > Additional Dependencies” and add libtiff.lib.
close the property window using "apply". (this is clarified by Antonio rubby,
vide:
http://social.msdn.microsoft.com/Forums/en/Vsexpressvc/thread/cee0448f-5435-4fc1
-85f0-ae18fb71944d
12.Build the solution. Watch the error list. If you get a bunch of LNK2109
errors, that means the linker can't find something tesseract references. You're
missing a reference to one of the paths from libtiff. If you get an error
mentioning mt.exe, you've possibly encountered a bug in the sdk. Just try
building again. see
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=
106634 for more info.
If/when the solution builds successfully, you'll have a tesseract.exe file in
the same directory as the tesseract solution file. drag you multipage
compressed tiff here and try running tesseract.
Hopefully (fingers crossed, heh) you've now got an OCR'd out.txt file sitting
in C:\projects\tesseract-2.04.
Original comment by rnkan...@gmail.com
on 27 Mar 2012 at 4:22
2.04 was released in June 2009. Now it is March 2012 and process to release of
3.02 version already started (see forums)...
2.04 supported 5 languages 3.02 will support 68 languages. API changes from
that time etc...
I hope you have a very good reason to spent time with unsupported version.
Original comment by zde...@gmail.com
on 27 Mar 2012 at 6:56
Original issue reported on code.google.com by
rnkan...@gmail.com
on 27 Mar 2012 at 1:13Attachments: