nguyenq / tess4j

Java JNA wrapper for Tesseract OCR API
Apache License 2.0
1.58k stars 372 forks source link

Docker Image with Java 11 + tess4j:5.2.0 + Spring Boot 2.6.6 not working #231

Closed vipulpatel2103 closed 2 years ago

vipulpatel2103 commented 2 years ago

Docker File

FROM openjdk:11
ARG JAR_FILE=build/libs/ocr-0.0.1-SNAPSHOT.jar
COPY ${JAR_FILE} document-ocr.jar
ENTRYPOINT ["java","-jar","/document-ocr.jar"]

Spring Boot version: 2.6.6 Tess4j: 5.2.0

Unable to load JNA module error. Locally it's running fine but not in docker image.

2022-04-26 04:28:30.458 ERROR 5556 --- [nio-8080-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Handler dispatch failed; nested exception is java.lang.UnsatisfiedLinkError: The specified module could not be found.
] with root cause

java.lang.UnsatisfiedLinkError: The specified module could not be found.

        at com.sun.jna.Native.open(Native Method) ~[jna-5.10.0.jar!/:5.10.0 (b0)]
        at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:277) ~[jna-5.10.0.jar!/:5.10.0 (b0)]
        at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:461) ~[jna-5.10.0.jar!/:5.10.0 (b0)]
        at com.sun.jna.Library$Handler.<init>(Library.java:192) ~[jna-5.10.0.jar!/:5.10.0 (b0)]
        at com.sun.jna.Native.loadLibrary(Native.java:672) ~[jna-5.10.0.jar!/:5.10.0 (b0)]
        at com.sun.jna.Native.loadLibrary(Native.java:656) ~[jna-5.10.0.jar!/:5.10.0 (b0)]
        at net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:85) ~[tess4j-5.2.0.jar!/:5.2.0]
        at net.sourceforge.tess4j.TessAPI.<clinit>(TessAPI.java:42) ~[tess4j-5.2.0.jar!/:5.2.0]
        at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:441) ~[tess4j-5.2.0.jar!/:5.2.0]
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:325) ~[tess4j-5.2.0.jar!/:5.2.0]
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:308) ~[tess4j-5.2.0.jar!/:5.2.0]
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:289) ~[tess4j-5.2.0.jar!/:5.2.0]
        at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:273) ~[tess4j-5.2.0.jar!/:5.2.0]
        at com.doc.ocr.processor.image.ImageProcessor.getSegments(ImageProcessor.java:35) ~[classes!/:na]
        at com.doc.ocr.processor.DocProcessorService.processDocument(DocProcessorService.java:30) ~[classes!/:na]
        at com.doc.ocr.rest.DocOCRController.doOCR(DocOCRController.java:29) ~[classes!/:na]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
        at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
nguyenq commented 2 years ago

Which OS are you running on? Do you have the prerequisites (such as VC++ Runtime, if on Windows; or Tesseract, if on others) installed?

vipulpatel2103 commented 2 years ago

Hello @nguyenq , thank you for response. OS - Windows Server 2019 Datacenter Docker - Docker version 20.10.9, build 591094d

It's working fine when I run it on target OS. The target OS has VC++ installed. But when I build and run it through Docker Image, it's failing. I am not sure if docker base image (i.e. openjdk:11) have VC++ or not.

vipulpatel2103 commented 2 years ago

I also tried with OpenJDK base image with Tesseract-ocr. It's failing with below error.

FROM openjdk:8-jdk-alpine
RUN apk update
RUN apk add \
    tesseract-ocr
ARG JAR_FILE=build/libs/ocr-0.0.1-SNAPSHOT.jar
COPY ${JAR_FILE} document-ocr.jar
ENTRYPOINT ["java","-jar","/document-ocr.jar"]

ERROR

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ffaaa1fe2b6, pid=1, tid=0x00007ffaab26eb10
#
# JRE version: OpenJDK Runtime Environment (8.0_212-b04) (build 1.8.0_212-b04)
# Java VM: OpenJDK 64-Bit Server VM (25.212-b04 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 3.12.0
# Distribution: Custom build (Sat May  4 17:33:35 UTC 2019)
# Problematic frame:
# C  [libtesseract.so.4.0.0+0x1d62b6]  ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+0x164
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# //hs_err_pid1.log
!strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 209
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   https://icedtea.classpath.org/bugzilla
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
bromine0x23 commented 2 years ago

Meet similar error on Java 17.0.2 + tess4j 5.2.0 + SpringBoot 2.6.6 + CentOS 8.4.2105 + tesseract-4.1.1-2.el8.x86_64.rpm

# 
# A fatal error has been detected by the Java Runtime Environment: 
# 
#  SIGSEGV (0xb) at pc=0x00007f86b16e2920, pid=1, tid=25 
# 
# JRE version: OpenJDK Runtime Environment Zulu17.32+13-CA (17.0.2+8) (build 17.0.2+8-LTS) 
# Java VM: OpenJDK 64-Bit Server VM Zulu17.32+13-CA (17.0.2+8-LTS, mixed mode, emulated-client, sharing, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64) 
# Problematic frame: 
# C  [libtesseract.so.4.0.1+0x268920]  PAGE_RES_IT::ReplaceCurrentWord(tesseract::PointerVector<WERD_RES>*)+0xd0 
# 
# Core dump will be written. Default location: Core dumps may be processed with "/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h" (or dumping to //core.1) 
# 
# An error report file with more information is saved as: 
# //hs_err_pid1.log 
# 
# If you would like to submit a bug report, please visit: 
#   http://www.azul.com/support/ 
# The crash happened outside the Java Virtual Machine in native code. 
# See problematic frame for where to report the bug. 
# 
vipulpatel2103 commented 2 years ago

@bromine0x23 , I am not sure but you are using different version of Tess4j and tesseract source. Just have a look if that's the issue.

bromine0x23 commented 2 years ago

@vipulpatel2103 I also tested with tess4j 4.1.1 & 4.6.1, and still got error. And the error didn't appear at the beginning, it appeared suddenly after running successfully for several days.

bromine0x23 commented 2 years ago

Just tested again, no errors occurred on all these version: 4.6.1, 5.2.0 & 5.2.1. Could this be a time-related problem?

nguyenq commented 2 years ago

As for the strcmp(locale, "C") exception, it looks like you are using a pretty old Tesseract version. The locale issue has been fixed a while back, so you definitely want to update your Tesseract version.

https://github.com/tesseract-ocr/tesseract/issues/1670 https://github.com/nguyenq/tess4j/issues/105

vipulpatel2103 commented 2 years ago

Issue resolved with Java 8 + openjdk:8-jdk-alpine + tess4j:4.5.4. Seems issue is not with Tess4J but due to Docker Base image with Tesseract. Closing the ticket.