keensoft / alfresco-simple-ocr

Simple OCR action for Alfresco
Other
44 stars 30 forks source link

ocrmypdf doesn't work using alfresco-simple-ocr #42

Closed alicedoe closed 6 years ago

alicedoe commented 7 years ago

Hi,

i have an issue using alfresco-simple-ocr and facing this error when i tried to OCR a pdf :

Exception in thread "defaultAsyncAction1" java.lang.RuntimeException: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 10220020 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/bin/ocrmypdf --verbose 1 --force-ocr -l eng /alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_6969309335739725478.pdf /alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_6969309335739725478_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/lib/python3.5/site-packages/ocrmypdf/__main__.py", line 53, in <module> _unicodefun._verify_python3_env at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:183) at es.keensoft.alfresco.ocr.OCRExtractAction.access$200(OCRExtractAction.java:38) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:164) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:161) at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464) at es.keensoft.alfresco.ocr.OCRExtractAction.executeInNewTransaction(OCRExtractAction.java:169) at es.keensoft.alfresco.ocr.OCRExtractAction.access$100(OCRExtractAction.java:38) at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask.run(OCRExtractAction.java:151) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 10220020 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/bin/ocrmypdf --verbose 1 --force-ocr -l eng /alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_6969309335739725478.pdf /alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_6969309335739725478_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/lib/python3.5/site-packages/ocrmypdf/__main__.py", line 53, in <module> _unicodefun._verify_python3_env at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:86) at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:181) ... 10 more Caused by: org.alfresco.service.cmr.repository.ContentIOException: 10220020 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/bin/ocrmypdf --verbose 1 --force-ocr -l eng /alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_6969309335739725478.pdf /alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_6969309335739725478_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/lib/python3.5/site-packages/ocrmypdf/__main__.py", line 53, in <module> _unicodefun._verify_python3_env at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:79) ... 11 more

My config file: ### OCR config ### ocr.command=/usr/bin/ocrmypdf ocr.output.verbose=true ocr.output.file.prefix.command= ocr.extra.commands=--verbose 1 --force-ocr -l eng ocr.server.os=linux

If i use this command directly it's working the document is created : ocrmypdf --verbose 1 --force-ocr -l eng /alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_5341287848260715795.pdf /alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_5341287848260715795_ocr.pdf

I'm using centOS and alfresco with docker

thanks :)

angelborroy-ks commented 7 years ago

Try this workarounds: https://github.com/keensoft/alfresco-simple-ocr/wiki/FAQ

alicedoe commented 7 years ago

thanks for the answer it really help to find the problem using /bin/su i've got an issue regading my locale variables :

`Traceback (most recent call last): File "/bin/ocrmypdf", line 7, in from ocrmypdf.main import run_pipeline File "/usr/lib/python3.5/site-packages/ocrmypdf/main.py", line 53, in _unicodefun._verify_python3_env() File "/usr/lib/python3.5/site-packages/ocrmypdf/_unicodefun.py", line 108, in _verify_python3_env 'environment.' + extra) RuntimeError: ocrmypdf will abort further execution because Python 3 was configured to use ASCII as encoding for the environment.

This system lists a couple of UTF-8 supporting locales that you can pick from. The following suitable locales were discovered: en_AG.utf8, en_AU.utf8, en_BW.utf8, en_CA.utf8, en_DK.utf8, en_GB.utf8, en_HK.utf8, en_IE.utf8, en_IN.utf8, en_NG.utf8, en_NZ.utf8, en_PH.utf8, en_SG.utf8, en_US.utf8, en_ZA.utf8, en_ZM.utf8, en_ZW.utf8`

it's weird because i set it in my dockerfile

drmedrme commented 6 years ago

Any solution on this as I am having similar problem. I had simple-ocr with ocrmypdf setup on another system with no problem. trying to set up a new system and getting. The script runs fine when done outside Alfresco. I also tried the alternative scripts mentioned here.

Thanks GH

Execution result: os: Linux command: /opt/alfresco/scripts/ocrmypdf.sh /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_4220864260626106298.pdf /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_4220864260626106298_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in from ocrmypdf.main import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/main.py", line 76, in verify_python3_env( at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:79) ... 79 more 2018-05-16 12:53:51,627 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-42] Exception from executeScript: Server error (04160022). Details can be found in the server logs. java.lang.RuntimeException: Server error (04160022). Details can be found in the server logs. at org.alfresco.repo.web.scripts.RepositoryContainer.executeScript(RepositoryContainer.java:328) at org.springframework.extensions.webscripts.AbstractRuntime.executeScript(AbstractRuntime.java:399) at org.springframework.extensions.webscripts.AbstractRuntime.executeScript(AbstractRuntime.java:210) at org.springframework.extensions.webscripts.servlet.WebScriptServlet.service(WebScriptServlet.java:132) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207) at org.alfresco.module.aosmodule.service.ContextRootFilter.doFilter(ContextRootFilter.java:93) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207) at org.alfresco.web.app.servlet.GlobalLocalizationFilter.doFilter(GlobalLocalizationFilter.java:68) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1132) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748)

angelborroy-ks commented 6 years ago

So if you run the script

$ /opt/alfresco/scripts/ocrmypdf.sh /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_4220864260626106298.pdf /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_4220864260626106298_ocr.pdf

from command line is working fine, right?

drmedrme commented 6 years ago

yes. Also. Thanks for a great product.

drmedrme commented 6 years ago

Hello. I came back to see if I could get up and running again and was able to sort above mentioned problem by editing main.py and removing verify_python3_env(
Error I had was

File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 69, in <module>
    verify_python3_env(
         at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:79)
        ... 78 more

Thanks.

Ikkache27 commented 4 years ago

Try this workarounds: https://github.com/keensoft/alfresco-simple-ocr/wiki/FA

Try this workarounds: https://github.com/keensoft/alfresco-simple-ocr/wiki/FAQ

Hello @angelborroy-ks

I have this issue ... i'm using Ocrmypdf with alfresco ... Ocrmypdf work well manually using the command ... but when I use it with alfresco 'OCR action' does'nt work ... this is the log :

Caused by: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 000817996 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l eng /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_4887267237326407155.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_4887267237326407155_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 5, in from ocrmypdf.main import run File "/root/.local/lib/python3.6/site-packages/ocrmypdf/init.py", line 20, in from .api import Verbosity at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:183) at es.keensoft.alfresco.ocr.OCRExtractAction.access$200(OCRExtractAction.java:38) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:164) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:161) at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464) at es.keensoft.alfresco.ocr.OCRExtractAction.executeInNewTransaction(OCRExtractAction.java:169) at es.keensoft.alfresco.ocr.OCRExtractAction.access$100(OCRExtractAction.java:38) at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask.run(OCRExtractAction.java:151) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)

thankx for your help