nguyenq / tess4j

Java JNA wrapper for Tesseract OCR API
Apache License 2.0
1.61k stars 373 forks source link

Getting OSD information (rotate angle) #180

Closed mkczyk closed 4 years ago

mkczyk commented 4 years ago

I try to get rotate information (orientation angle) from Tess4j:

ITesseract instance = new Tesseract();
instance.setDatapath("/path/to/tessdata");
instance.setLanguage("osd");
instance.setPageSegMode(ITessAPI.TessPageSegMode.PSM_OSD_ONLY);
instance.setOcrEngineMode(ITessAPI.TessOcrEngineMode.OEM_LSTM_ONLY);
String result = instance.doOCR(new File("/path/to/image.png"));
System.out.println(result);

But it cause error: Error: LSTM requested, but not present!! Loading tesseract. and returns empty string as response.

When I change TessOcrEngineMode from OEM_LSTM_ONLY to OEM_DEFAULT, that error disappears but it still returns empty string as response.

Maybe method doOCR isn't dedicated for getting OSD information?


My environment and Tesseract are configured properly, because I can do OCR from Tess4j (it returns recognized text):

ITesseract instance = new Tesseract();
instance.setDatapath("/path/to/tessdata");
instance.setLanguage("pol");
instance.setPageSegMode(ITessAPI.TessPageSegMode.PSM_AUTO_ONLY);
instance.setOcrEngineMode(ITessAPI.TessOcrEngineMode.OEM_LSTM_ONLY);
String result = instance.doOCR(new File("/path/to/image.png"));
System.out.println(result);

(I have two downloaded dictionaries: pol.traineddata and osd.traineddata for attempts with OSD.)

And I can get OSD information without Tess4j (directly from console to Tesseract):

$ tesseract --psm 0 -l osd image.png stdout
Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 0.35
Script: Latin
Script confidence: 7.78
nguyenq commented 4 years ago

That's not the correct way to call. Look for testTessBaseAPIDetectOrientationScript test case in the unit tests.

nguyenq commented 4 years ago

Can the ticket be closed?