nguyenq / VietOCR3

Java GUI frontend for Tesseract OCR engine
63 stars 17 forks source link

how to manually config language file if I don't have write access to C:\Program Files\Tesseract-OCR\tessdata folder? #26

Closed leolle closed 2 years ago

leolle commented 2 years ago

I don't have administrator privilege. But I can run .jar.

nguyenq commented 2 years ago

You can set the environment variable TESSDATA_PREFIX to the path to tessdata.

http://vietocr.sourceforge.net/usage.html

leolle commented 2 years ago

Thank you for your response, what if I can't modify environment variable and I can't copy any files to c:\Program Files (x86)\Tesseract-OCR\tessdata\ either? Are there any other methods?

nguyenq commented 2 years ago

You should be able to set the environment variable in a command line and run the program from there. Or you can put it in a .bat file to launch the program with it.

SET TESSDATA_PREFIX=C:\Temp\tessdata

leolle commented 2 years ago

You should be able to set the environment variable in a command line and run the program from there. Or you can put it in a .bat file to launch the program with it.

SET TESSDATA_PREFIX=C:\Temp\tessdata

Hi, I set my .bat file like this.

set TESSDATA_PREFIX=c:\Users\Z\Downloads\VietOCR3\
start javaw -Xms128m -Xmx2048m -jar VietOCR.jar

My Tesseract.exe version is 3.2.0.0 I download the version 3 language data, it can successfully OCR Vietnamese, except Chinese(result is retangles). Do you know the reason? Thank you.

nguyenq commented 2 years ago

Rectangles are indication of the selected font not supporting display of the desired characters. You need to select a Unicode font that contains Chinese glyphs -- Arial Unicode MS, for example.

leolle commented 2 years ago

Rectangles are indication of the selected font not supporting display of the desired characters. You need to select a Unicode font that contains Chinese glyphs -- Arial Unicode MS, for example.

Thank you, the problem is solved by changing the font of VietOCR to my local language.