sgzwiz / tesseractdotnet

Automatically exported from code.google.com/p/tesseractdotnet
0 stars 2 forks source link

Svn Source is out of date #12

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The source in svn appears to be out of date. For instance the latest downloads, 
tesseractdotnetwrapper_r590.zip and IPoVn_Release_x86.zip at time of writing, 
have additional methods and functionality compared to what is in the svn 
repository. 

There also appears to be two different versions of the 
'tesseractenginewrapper.h' and 'tesseractenginewrapper.cpp' files one under 
'.\dotnetwrapper\TesseractEngineWrapper' and another under 
'.\dotnetwrapper\Source\api' where the former appears to be of an older version.

Assuming I haven't made some mistake would you be able to update the svn 
repository so that we can build tesseractdotnetwrapper_r590 ourselves?

Original issue reported on code.google.com by Charles....@gmail.com on 21 Jul 2011 at 11:34

GoogleCodeExporter commented 9 years ago
I built the wrapper under vs2010 netframework 4. 
I confirm that the version is different from the IPoVn_Release_x86.zip.
It seems that the one in '.\dotnetwrapper\TesseractEngineWrapper' is from R552.

With the wrapper I built I have the issue 'RetrieveResultDetail' always NULL.

Many thanks for the great work.

Original comment by cesare.e...@gmail.com on 23 Jul 2011 at 5:21

GoogleCodeExporter commented 9 years ago
1. Svn source is not current version, which I succeed to integrate 
libleptxxx.lib into tesseract proeject, also add some functions supported to 
create Pix from buffer directly. I will update soon.

2. Current source are: /trunk/dotnetwrapper/source/*.*
This version requires libleptxxx.dll for loading tesseract.dll assembly.
You all should compile it succeed with debug also release mode.

3. The engine wrapper for tesseract-ocr v3.01 r552 is different to v3.01 r590.
With r552: I have changed some .h/.cpp files from some projects => It will not 
compatible with furthermore changing of tesseract-ocr, so I have to make some 
wrapper for r590, in which I hope it will adapt to official tesseract-ocr v3.01 
release. 

With r590: I have changed only some line code in ccmain project to load 
tessdata from any location, please have a look at line #106 in tessedit.cpp and 
add _BUILDASDLL preprocessor to project configuration.
- with wrapper: all things are in source\api\

@all: I want to make sure that: I developed on vs2008, and build/refer all x86 
platform. Everything else, I have not tested yet.
You all should use x86 assemblies (from downloads tab) for x86 application.

Sorry about any confusing! I will try to make them better in next version soon!
Thanks!

Original comment by congnguy...@gmail.com on 24 Jul 2011 at 3:17

GoogleCodeExporter commented 9 years ago
Thanx for the quick replay I look forward to checking out the source for latest 
version of the wrapper when available.

On a related note you may want to build the wrapper (vs2008) in release 
configuration and distribute those dlls. Instead of the dlls generated by the 
debug configuration since one cannot redistribute the debug version of Visual 
C++ 2008 Runtime (I had to install Visual Studio 2008 to use the distributed 
version of the wrapper).

Original comment by Charles....@gmail.com on 24 Jul 2011 at 6:30

GoogleCodeExporter commented 9 years ago
Tanks for the replay.
As Charles says, me too I faced the problem in distribute dlls generated in 
debug.
First for the netframework 2 (minor), second for the sidebyside error (event id 
33) on Microsoft.VC90.DebugCRT... that I don't know how to resolve even 
installing the C++ runtime for Vs2008.
Waiting for the official R590 the actual version generated in release could 
help a lot.
Thanx again. 

Original comment by cesare.e...@gmail.com on 24 Jul 2011 at 7:05

GoogleCodeExporter commented 9 years ago
You can resolve the side by side error by compiling the solution in release 
configuration after making the required project setting changes as described in 
the wiki. This way the resulting dll only depends on the Release Visual 2008 
C++ runtime which can be redistributed with your setup program(s). 
Alternatively if you want to use the latest API for ROI or other nice stuff 
then just make sure you've got Visual Studio 2008 installed, if you have it, 
which should do for development purposes until we have the updated source 
and\or a release build available.

Original comment by Charles....@gmail.com on 24 Jul 2011 at 9:37

GoogleCodeExporter commented 9 years ago
The IPoVn_Release_x86.zip package (built as x86 platform, release mode) are 
tesseract-ocr v3.01 r590 .net wrapper. Link here: 
http://code.google.com/p/tesseractdotnet/downloads/list

@cesare...: as I mention that I developed on vs2008 team, so please have a test 
on vs2008 IDE. It should not test on vs2010 without re-compiling. 
Some examples about how to use them please refer to 
tesseractdotnetwrapper_r590.zip package.

Original comment by congnguy...@gmail.com on 24 Jul 2011 at 12:29

GoogleCodeExporter commented 9 years ago
Unfortunately the tesseract.dll in the IPoVn_Release_x86.zip package still 
depends on the Debug version of the Visual C++ runtime. Perhaps the settings on 
the tesseract project aren't correct or it depends on the debug version of a 
library?

Here is the logs I get when I try and run my demo app:

Application Exception:
System.IO.FileLoadException: Could not load file or assembly 'tesseract.dll' or 
one of its dependencies. The application has failed to start because its 
side-by-side configuration is incorrect. Please see the application event log 
or use the command-line sxstrace.exe tool for more detail. (Exception from 
HRESULT: 0x800736B1)
   at TesseractOCRDemo.Program.Main(String[] args)

Event Log:
Activation context generation failed for "C:\...\tesseract.dll". Dependent 
Assembly 
Microsoft.VC90.DebugCRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e1
8e3b",type="win32",version="9.0.21022.8" could not be found. Please use 
sxstrace.exe for detailed diagnosis.

Anyway I think this issue is covered by Issue #1 or if not should be filed as a 
separate issue. Thoughts?

Original comment by Charles....@gmail.com on 24 Jul 2011 at 10:54

GoogleCodeExporter commented 9 years ago
I don't know this link 
(http://simple-pc-help.com/How-To-Safely-Fix-Error-0x800736b1/) is helpful or 
not? Please have a quick test once :).

Original comment by congnguy...@gmail.com on 25 Jul 2011 at 12:26

GoogleCodeExporter commented 9 years ago
If you have not install vcresdist_x86 sp1, you have to install it first.

Try to download and install vcresdist_x86 here:
http://www.microsoft.com/download/en/details.aspx?id=5582

The Microsoft Visual C++ 2008 SP1 Redistributable Package (x86)
installs runtime components of Visual C++ Libraries required to run
applications developed with Visual C++ SP1 on a computer that does not
have Visual C++ 2008 SP1 installed.

Original comment by congnguy...@gmail.com on 25 Jul 2011 at 12:29

GoogleCodeExporter commented 9 years ago
I have installed the "The Microsoft Visual C++ 2008 SP1 Redistributable Package 
(x86)" however I believe the issue is that tesseract.dll in the 
IPoVn_Release_x86.zip package is dependent on the DEBUG version of the 
Microsoft Visual C++ 2008 Runtime which is not included in the Redistributable 
Package. This is made apparent in the event log or by running sxstrace.exe. 
Another telling sign is that my demo program works fine on a machine with 
Visual Studio 2008 installed; presumably because the debug version of the 
runtime is installed by Visual Studio 2008.

Original comment by Charles....@gmail.com on 25 Jul 2011 at 2:36

GoogleCodeExporter commented 9 years ago
@congnguy: 
The http://simple-pc-help.com/How-To-Safely-Fix-Error-0x800736b1/ you suggested 
is a cleaner tool ... and as all others like this for 'only €29,97' promises 
all possible miracles on your machine. No comment.

I built the wrapper with VS2008 IDE (see your comment 6), without problem. 
It contains new methods as expected. Now, I found a problem in recognizing: the 
method Recognize is applied to the image and returns an unformatted string. 
In my application I need the recognition per Line, so I used the AnaliseLayout 
method with ePageSegMode.PSM_SINGLE_LINE. For each Blocks, Paragraphs, Lines, 
Words I finally found the property TEXT: this property is always EMPTY an also 
the CharList contains always zero but the Charlist.Count property is correct.
For sure I missed something and in the samples I didn't found a way for my 
needs. Would you help me, please? 

Original comment by cesare.e...@gmail.com on 25 Jul 2011 at 6:17

GoogleCodeExporter commented 9 years ago
Sorry, I found link after searching around on internet :)

AnaliseLayout:
=> return document-layout structure as doucument >> blocks >> paragraphs >> ...
Means that they are coordinate of found blobs, they does not contains any 
recognition item!
You can give it some parameters as you usually do with origin tesseract-ocr 
engine, ie. ePageSegment mode, some relative variables...

Recognize:
=> return recognized text
=> You can recognize on whole image, or only on region of interest, use UseROI, 
ROI properties as samples on example solution.

Example: Recognize on each line

0. Do some image-preprocessing-functions if possible...

1. Do analyse layout

TesseractProcessor processor = ???
Set parameters/variables if need???

// get document-layout
Document doc = processor.AnalyseLayout(...);

2. Do some image-enhancement-functions for each level-blob if possible

3. Recognize on each line as below:

//set use ROI flag here
processor.UseROI = true;

-foreach Block in document
--foreach Paragraph in block
--foreach TextLine in paragraph
------//set ROI to processor corresponding to current TextLine(l, t, r, b)
------processor.ROI = new Rectangle(l, t, r-l+1, b-r+1);
------recongized_text = processor.Recognize(...);

4. Do some post-processing-functions here if possible

Original comment by congnguy...@gmail.com on 25 Jul 2011 at 8:06

GoogleCodeExporter commented 9 years ago
Ok, so the tesseract.Dll in the Ipov release x86 requires libleptxxx.dll ! 

Original comment by osiri...@gmail.com on 25 Jul 2011 at 8:14

GoogleCodeExporter commented 9 years ago
No I believe the latest release (IPoVn_Release_x86.zip) integrates the 
libleptxxx dependency the only deployment issue that I'm aware of atm is that 
tesseract.dll has a dependency on the debug version of Visual C++ 2008 runtime, 
effectively meaning that you must have Visual Studio 2008 installed on the 
machine to use the library. 

Original comment by Charles....@gmail.com on 25 Jul 2011 at 8:55

GoogleCodeExporter commented 9 years ago
@congnguy...: about comment 12.
I have this error: 'useroi' is not a member of 
'OCR.TesseractWrapper.TesseractProcessor' ... 
Sorry, something more is missing in my build, I guess.

I solved submitting to Recognize a cropped image to the bounds of each 
TextLine. 
The process is very slow... I hope that with ROI will be better.
Many thanks.

Original comment by cesare.e...@gmail.com on 25 Jul 2011 at 9:18

GoogleCodeExporter commented 9 years ago
@cesare...: UseROI only exists in IPoVn_Release_x86.zip package, 
tesseractdotnetwrapper_r590.zip package

Original comment by congnguy...@gmail.com on 25 Jul 2011 at 10:07

GoogleCodeExporter commented 9 years ago
@cesare...: you can modify some line code as below to drive tesseract recognize 
only on ROI:

api->SetImage(pix);

// please help me check SetRectangle() again, thanks!
api->SetRectangle(left, top, width, height);

bool succed = api->Recognize(null) >= 0;

Original comment by congnguy...@gmail.com on 25 Jul 2011 at 10:14

GoogleCodeExporter commented 9 years ago
Hope I modified the correct module...
There are errors, see attach.

Original comment by cesare.e...@gmail.com on 25 Jul 2011 at 11:01

Attachments:

GoogleCodeExporter commented 9 years ago
You have to pass ROI (left, top, width, height) to the function! The 
compilation errors are clearly!

Original comment by congnguy...@gmail.com on 25 Jul 2011 at 11:56

GoogleCodeExporter commented 9 years ago
Maybe, you should call SetRectangle() before SetImage() for speed. Please help 
me check this in origin tesseract-ocr engine comments :D.

Original comment by congnguy...@gmail.com on 25 Jul 2011 at 12:37

GoogleCodeExporter commented 9 years ago
You are right, I know what it means, but my knowlege of C++ is exactly a big 
zero.
I already have the workaround for my needs and if you have no time to give me 
all informations, I can understand. 
Thank you, I'll wait for the official release when will be ready.

Original comment by cesare.e...@gmail.com on 25 Jul 2011 at 1:24

GoogleCodeExporter commented 9 years ago
I confirm there is a depency with something VS 2008 installs. 
My project with working fine on my other computer, and I get the 
FileLoadException again now since im on the computer without VS 2008 

Original comment by osiri...@gmail.com on 25 Jul 2011 at 5:44

GoogleCodeExporter commented 9 years ago
Here is the error i found in the Event logs : 

\bin\Debug\tesseract.dll". Dependent Assembly 
Microsoft.VC90.DebugCRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e1
8e3b",type="win32",version="9.0.21022.8" could not be found. Please use 
sxstrace.exe for detailed diagnosis.

Original comment by osiri...@gmail.com on 25 Jul 2011 at 6:41

GoogleCodeExporter commented 9 years ago
Any ETA on the sources for the latest binaries (r590)? Perhaps you could just 
add the wrapper files as a new directory under source control until you have 
time to properly update the project(s)?

Original comment by Charles....@gmail.com on 8 Aug 2011 at 12:42

GoogleCodeExporter commented 9 years ago
@all: new version is here: 
http://tesseractdotnet.googlecode.com/files/tesseractdotnet_v301_r590.zip

The package includes:
- x86release assemblies
- all source codes

Original comment by congnguy...@gmail.com on 10 Aug 2011 at 5:16

GoogleCodeExporter commented 9 years ago
Thanx for the update mate.

Original comment by Charles....@gmail.com on 10 Aug 2011 at 5:41

GoogleCodeExporter commented 9 years ago
Tesseract 3.01 (r639) has been released. Please update SVN or provide a wrapper 
for that release. Thanks.

Original comment by nguyen...@gmail.com on 29 Oct 2011 at 4:39

GoogleCodeExporter commented 9 years ago
here is a patch created with TortoiseSVN from my working version of 
tesseract+wrapper. the svn version of teseract at this moment is r639.

Original comment by tanelte...@gmail.com on 6 Nov 2011 at 9:15

Attachments:

GoogleCodeExporter commented 9 years ago
I forked tesseract-ocr and merged in tesseractdotnet in on github. Note that 
this fork is currently only targeting .NET 4.0 not .NET 3.5 like the current 
version and does not have the extra thresholders or other changes (it's just a 
wrapper around tesseract-ocr). https://github.com/charlesw/tesseract-ocr-dotnet

Original comment by Charles....@gmail.com on 6 Nov 2011 at 10:31

GoogleCodeExporter commented 9 years ago
@tanel & Charles: Yours is vs2010, which produces a .NET 4.0 DLL. I need one 
for .NET 2.0. Can you make a patch for vs2008? Thanks.

Original comment by nguyen...@gmail.com on 7 Nov 2011 at 12:44

GoogleCodeExporter commented 9 years ago
Yes I'm aware of that and yes I can update it to also produce a .NET 2.0 DLL, 
but not sure when I'll be able to do so. I'll post an update when done.

Original comment by Charles....@gmail.com on 7 Nov 2011 at 12:59

GoogleCodeExporter commented 9 years ago
Thanks. I'm looking forward to it.

Original comment by nguyen...@gmail.com on 13 Nov 2011 at 3:27

GoogleCodeExporter commented 9 years ago
Just letting you know that I've uploaded a version for .NET 2.0. If you have 
any issues please let me know (just file an issue, on github, and I'll see what 
I can do).

Original comment by Charles....@gmail.com on 16 Nov 2011 at 10:58

GoogleCodeExporter commented 9 years ago
It looks good. Thank you. Hope it be incorporated into the trunk soon.

Original comment by nguyen...@gmail.com on 26 Nov 2011 at 6:24