sonurakpinar / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Unable to resolve tessdata path in r765 #764

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Execute tesseract.exe not from the current directory containing tesseract 
executable and tessdata folder.
2.
3.

What is the expected output? What do you see instead?

The program crashed with the following error:

Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent 
directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

What version of the product are you using? On what operating system?
r765; Win7.

Please provide any additional information below.

It was running fine as recently as r737. Recent fixes in relation with 
TESSDATA_PREFIX probably have broken the functionality on Windows. Does not 
seem to have the same issue on Ubuntu.

Original issue reported on code.google.com by nguyen...@gmail.com on 26 Sep 2012 at 2:43

GoogleCodeExporter commented 9 years ago
there was only one change (see [1]) regarding TESSDATA_PREFIX - it try to 
check&add "/" if it is missing (see issue 702). All other logic is the same.

There is difference on linux build (or build using autotools) and VC++ build - 
autotool build in (by default but it can be changed) TESSDATA_PREFIX.

[1] 
http://code.google.com/p/tesseract-ocr/source/diff?path=/trunk/ccutil/mainblk.cp
p&format=side&r=761

Original comment by zde...@gmail.com on 26 Sep 2012 at 6:35

GoogleCodeExporter commented 9 years ago
OK, I reverted to several previous revisions and was able to pin down the one 
that started the problem: r760.

Original comment by nguyen...@gmail.com on 27 Sep 2012 at 12:22

GoogleCodeExporter commented 9 years ago
I can't look into this fully right now but it's probably the "int rc = 
api.Init(NULL, NULL)" line in tesseractmain.cpp that triggers this.

Original comment by JerseyChewi@gmail.com on 27 Sep 2012 at 7:56

GoogleCodeExporter commented 9 years ago
Thanks for testing.

I think I found reason: it looks like second api.init[1] has no effect 
if there no api.end() (for first api.init[2]) Please test r768.

[1] 
http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/tesseractmain.cpp
?spec=svn760&r=760#139
[2] 
http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/tesseractmain.cpp
?spec=svn760&r=760#73

Original comment by zde...@gmail.com on 27 Sep 2012 at 8:59

GoogleCodeExporter commented 9 years ago
That's fixed it! Thanks so much.

Original comment by nguyen...@gmail.com on 27 Sep 2012 at 11:23

GoogleCodeExporter commented 9 years ago

Original comment by zde...@gmail.com on 28 Sep 2012 at 6:14

GoogleCodeExporter commented 9 years ago
It's broken again by r774 changes.

Original comment by nguyen...@gmail.com on 10 Oct 2012 at 12:48

GoogleCodeExporter commented 9 years ago
please check r777

Original comment by zde...@gmail.com on 11 Oct 2012 at 8:00

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
r777 has fixed the issue for tesseract.exe; however, the DLL is now broken with 
relative path for tessdata. DLL still works fine with absolute path.

Original comment by nguyen...@gmail.com on 11 Oct 2012 at 11:55

GoogleCodeExporter commented 9 years ago
I am not sure if I got it right. Can you post example what do you mean with 
"DLL is now broken with relative path for tessdata. DLL still works fine with 
absolute path"?

If you are using tesseract API (dll) you should specify datadir (tessdata) in 
init function.
I tried to use absolute("C:\\Program Files\\Tesseract-OCR") and 
relative("..\\..\\..\\tesseract-ocr") path and it works for me.
If you do not specified path(NULL) than datadir is set to "./". 

If you need to know library path for init you can use something like this:
    char tessdata[2048];
    HMODULE pDll = LoadLibrary("libtesseract302.dll");
    GetModuleFileName(pDll, tessdata, 2048);
    STRING tessdata_dir;
    truncate_path(tessdata, &tessdata_dir);
    int rc = api->Init(tessdata_dir.string(), "eng")

Original comment by zde...@gmail.com on 12 Oct 2012 at 1:31

GoogleCodeExporter commented 9 years ago
I simply use "tessdata" for datapath parameter for Init method when the program 
crashes. If I use ".\\tessdata", then it is OK.

Original comment by nguyen...@gmail.com on 12 Oct 2012 at 1:54

GoogleCodeExporter commented 9 years ago
I am sorry, but I can not confirm you experience (comment #12) - e.g. 
"tesseract-ocr" work for me. 
I do not need to use ".\\tesseract-ocr" - "tesseract-ocr" is enough. See 
attached VC++ API example solution.

Original comment by zde...@gmail.com on 15 Oct 2012 at 6:53

Attachments:

GoogleCodeExporter commented 9 years ago
Sorry, I failed to mention that I encountered the problem when using the C-API 
interface; for instance,

TessBaseAPIInit3(handle, "tessdata", lang); // crash
TessBaseAPIInit3(handle, "./tessdata", lang); // OK

Original comment by nguyen...@gmail.com on 17 Oct 2012 at 12:24

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r782.

Original comment by theraysm...@gmail.com on 22 Oct 2012 at 11:41

GoogleCodeExporter commented 9 years ago
I just successfully attempted to apply the same change from r777 to capi.cpp, 
but this one r782 has fixed it too and is much cleaner. Thanks.

Original comment by nguyen...@gmail.com on 22 Oct 2012 at 11:54

GoogleCodeExporter commented 9 years ago
Hello Sir i am currently working on a project in JAVA for Extracting text from 
an Image
in specified region, but i m getting lot of issues like :

java.lang.Error: Invalid memory access
    com.sun.jna.Native.invokePointer(Native Method)
    com.sun.jna.Function.invokePointer(Function.java:470)
    com.sun.jna.Function.invoke(Function.java:404)
    com.sun.jna.Function.invoke(Function.java:315)
    com.sun.jna.Library$Handler.invoke(Library.java:212)
    $Proxy5.TessBaseAPIGetUTF8Text(Unknown Source)
    net.sourceforge.tess4j.Tesseract.getOCRText(Unknown Source)
    net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    com.imgToText.ImageManipulation.doPost(ImageManipulation.java:53)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:641)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:722)

Please give me some guidance to resolve this issue...

Original comment by satyam2...@gmail.com on 28 Jun 2014 at 9:26