jcallinan / tesseractdotnet

Automatically exported from code.google.com/p/tesseractdotnet
0 stars 0 forks source link

_ocrProcessor.Apply(System.Drawing.Image img) -- choke on corrupted memory #5

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. _ocrProcessor.Apply(image_object)
2.
3.

What is the expected output? What do you see instead?
I expect it to accept any image of type System.Drawing.Image. Instead, i'm 
getting a corrupt memory message.

What version of the product are you using? On what operating system?
1.0 on windows vista 32 bit 

Please provide any additional information below.
Whenever i save the image as a tiff file using FreeImage.NET the wrapper has no 
problems loading the tiff file from the path and ocr'ing it, but when i pass a 
tiff image object into into apply method, the wrapper complains about corrupt 
memory. I've also tried bitmaps with the same results. It would be nice if the 
wrapper took any System.Drawing.Image object and converted the image into a 
format that tesseract will not choke on.

One more thing. I'm also not receiving the IList<Word> results when calling 
RecieveResults. Other than that, i want to thank the author for the time and 
effort put into this library. I really appreciate it.

Original issue reported on code.google.com by pwizzl...@gmail.com on 10 May 2011 at 11:57

GoogleCodeExporter commented 8 years ago
Sorry for long time no updating project!

I don't have time to work on project now!

So I have released the version included the work space I have worked on!

Hope that it will resolve existed issues!

Cong.

Original comment by congnguy...@gmail.com on 16 May 2011 at 3:27

GoogleCodeExporter commented 8 years ago
I used tesseract3.dll in my web service application. When I publish web site, a 
error appeared "The specified module could not be found (Exception from HRESULT 
0x8007007E)"
Please help me to resolve this problem.

Original comment by phamphih...@gmail.com on 2 Jun 2011 at 3:16

GoogleCodeExporter commented 8 years ago
Fixed the issue. It was in PixFromImage. Here's the fix:

Pix* TesseractProcessor::PixFromImage(System::Drawing::Image* image)
{
    Pix* pix = NULL;

    MemoryStream* mmsTmp = NULL;
    unsigned char srcTmp __gc[] = NULL;
    unsigned char* dstTmp = NULL;

    System::Drawing::Bitmap __gc *bmp;
    System::Drawing::Graphics __gc *graphics;

    try
    {       
        mmsTmp = new MemoryStream();
        bmp = new System::Drawing::Bitmap(image->Width, image->Height, System::Drawing::Imaging::PixelFormat::Format24bppRgb);
        graphics = System::Drawing::Graphics::FromImage(bmp);
        graphics->DrawImage(image,0,0, image->Width, image->Height);
        bmp->Save(mmsTmp, System::Drawing::Imaging::ImageFormat::Tiff);     

        int length = mmsTmp->Length;

        srcTmp = mmsTmp->GetBuffer();
        dstTmp = new unsigned char[length];
        System::Runtime::InteropServices::Marshal::Copy(srcTmp, 0, dstTmp, length);

        pix = pixReadMem(dstTmp, length);   

Original comment by pwizzl...@gmail.com on 7 Jun 2011 at 2:37

Attachments:

GoogleCodeExporter commented 8 years ago
I tried the PixFromImage fix, but I'm still getting an error:

"Attempted to read or write protected memory. This is often an indication that 
other memory is corrupt."

Stack Trace:
   at tesseract.Tesseract.recog_all_words(Tesseract* , PAGE_RES* , ETEXT_DESC* , TBOX* , SByte* , Int32 )
   at tesseract.TessBaseAPI.Recognize(TessBaseAPI* , ETEXT_DESC* monitor)
   at tesseract.TesseractProcessor.Process(TessBaseAPI* api, Pix* pix)
   at tesseract.TesseractProcessor.Apply(Image image)
<other info cut here>

I'm using Visual Studio Express 2010. Maybe that has something to do with it, 
or perhaps this is a different issue? I can't even process images via 
Apply(filename)

Original comment by mrflip...@gmail.com on 10 Jun 2011 at 7:33

GoogleCodeExporter commented 8 years ago
I too have the same error with any image, even the two official samples come 
with Tesseract. The patch didn't help.

Btw, I think, for that patch, the bmp should retain the same resolution as the 
original image.

bmp->setResolution(image->HorizontalResolution, image->VerticalResolution);

Original comment by nguyen...@gmail.com on 10 Jun 2011 at 10:49

GoogleCodeExporter commented 8 years ago
Why do not you all try to debug to get what is going wrong?

In case, if you cannot debug into unmanaged code from c# project please change 
project settings as below:

c# project >> project properties >> debug tab >> check "Enable unmanaged code 
debugging" option.

Below are some things you need to investigate why the problems occur:
- in application entry:
       . try to load image file into System.Drawing.Image
       . save image to tiff file: Save(mmsTmp, System::Drawing::Imaging::ImageFormat::Tiff);
- use original tesseract-ocr to process with tiff file
- succeed? failed?

@pwizzl...: thanks! please consider PixelFormat of input image before creating 
new temporary bitmap, some cases it is not needfully.

If anyone can pass image data to tesseract directly, it is best solution.

All general cases, you should own Page-layout step if you want to get more 
accuracy. Extracting information, then calling Adaptive Classifier from 
tesseract is good practically.

Original comment by congnguy...@gmail.com on 11 Jun 2011 at 4:33

GoogleCodeExporter commented 8 years ago
I think I got this working now. There were two main issues: I was initializing 
with an ocrEngineMode other than Default (3), and I was leaving the trailing 
slash off of the datapath. This was really confusing because the comments in 
the code specifically say to leave the trailing slash off of the end, and I had 
seen other posts saying to leave the slash off.

I was also ignoring the return value of Init, which might have pointed me in 
the right direction sooner had I been paying attention.

Original comment by mrflip...@gmail.com on 11 Jun 2011 at 12:01

GoogleCodeExporter commented 8 years ago
Ditto the last post - this was exactly my issue (Attempted to read or write 
protected memory") 
Adding the trailing slash fixed it. yaay. I have it working now (despite some 
other quirks with the word collection).

Original comment by hbeanl...@gmail.com on 1 Jul 2011 at 11:21

GoogleCodeExporter commented 8 years ago
The slash did it! After so many hours of frustration to developers.

Original comment by nguyen...@gmail.com on 3 Jul 2011 at 2:07

GoogleCodeExporter commented 8 years ago
Attempted to read or write protected memory. This is often an indication that 
other memory is corrupt.

This is the problem I met. 

I comes from the dll file. So I guess it can only be fixed by the author....

I just run the demonstration project which is tesseractconsole without any 
modification except commenting the AnalyseLayout() in Main() and uncommenting 
Recognize() to see how accurate it could be.

Original comment by franva...@gmail.com on 1 Sep 2011 at 12:57

GoogleCodeExporter commented 8 years ago
where the hell is necessary to correct the slash??????????

Original comment by Infinity...@gmail.com on 1 Jun 2012 at 10:18

GoogleCodeExporter commented 8 years ago
@Infinity...@gmail.com
when running Recognize() set path:
string tessdata = @"C:\tesseractPath\tessdata \<-- there "

Original comment by johan.he...@gmail.com on 4 Aug 2012 at 11:25

GoogleCodeExporter commented 8 years ago
I also have the same error Attempted to read or write protected memory. This is 
often an indication that other memory is corrupt, and I have slash on the end 
of the path, this error occure when I change language data from eng.trainedata 
(v. 3.01) to eng.trainedata (v. 3.02)(my tesseract.dll version is 3.01). 
This error also occured when I downloaded the newes tesseractengine3.dll and 
eng.trainedata (v. 3.02) Do you have some idea what I do wrong?

Original comment by cesb...@gmail.com on 7 Dec 2012 at 10:02

GoogleCodeExporter commented 8 years ago
ha..When I added leptonlibd.dll to project it started work. I think someone 
should do toturial when new version release. Step by step what should do to 
everything work good.
Where is Recognize function from TesseractProcessor in previous version 3.01 it 
was: 

protected Rectangle _roi;
        protected bool _useROI;

        public TesseractProcessor();

        public Rectangle ROI { get; set; }
        public bool UseROI { get; set; }

        public DocumentLayout AnalyseLayout(Image image);
        public DocumentLayout AnalyseLayoutBinaryImage(byte* binData, int width, int height);
        public DocumentLayout AnalyseLayoutGreyImage(byte* greyData, int width, int height);
        public DocumentLayout AnalyseLayoutGreyImage(ushort* greyData, int width, int height);
        public void Clear();
        public void ClearAdaptiveClassifier();
        public void ClearResults();
        public void DisableThresholder();
        public override void Dispose();
        protected virtual void Dispose(bool disposing);
        public void End();
        public bool GetBoolVariable(string name, ref bool value);
        public bool GetDoubleVariable(string name, ref double value);
        public bool GetIntVariable(string name, ref int value);
        protected void GetROI(int imageWidth, int imageHeight, int* left, int* top, int* width, int* height);
        public string GetStringVariable(string name);
        public string GetTesseractEngineVersion();
        public bool Init();
        public bool Init(string dataPath, string lang, int ocrEngineMode);
        public bool InitForAnalysePage();
        public string Recognize(Image image);
        public string Recognize(string filePath);
        public string RecognizeBinaryImage(byte* binData, int width, int height);
        public string RecognizeGreyImage(byte* greyData, int width, int height);
        public string RecognizeGreyImage(ushort* greyData, int width, int height);
        public bool SetPageSegMode(ePageSegMode psm);
        public bool SetVariable(string nam, string value);
        public void UseThresholder();

----------------------------------------------
in v 3.02 have available only  this mathod :

 public bool DoMonitor { get; set; }

        public string Apply(Image image);
        public string Apply(string filePath);
        public string Apply(Image image, int l, int t, int w, int h);
        public void Clear();
        public void ClearAdaptiveClassifier();
        public void ClearResults();
        public BlockList DetectBlocks(Image image);
        public void End();
        public bool GetBoolVariable(string name, ref bool value);
        public bool GetDoubleVariable(string name, ref double value);
        public bool GetIntVariable(string name, ref int value);
        public string GetStringVariable(string name);
        public string GetTesseractEngineVersion();
        public bool Init();
        public bool Init(string dataPath, string lang, int ocrEngineMode);
        public List<Word> RetriveResultDetail();
        public bool SetVariable(string nam, string value);

Where I can find wrapper file with this method??

Original comment by cesb...@gmail.com on 7 Dec 2012 at 12:13