Alfresco OCR plugin based on pdfsanwhich returns error when performing OCR transformation from Alfresco Share with additional options (for example "-rgb" or "-resolution") added in alfresco-global.properties in "ocr.extra.commands".
Expected behavior
By adding extra parameters to the OCR configuration ( "ocr.extra.commands") in alfresco-global.properties, launching the transformation manually or activating the transformation rule, the pdf file must be correctly transformed using all the extra parameters added (for example with "-rgb" must also acquire the colors being transformed)
Actual behavior
Added extra parameters (for example "-rgb" or "-resolution") in alfresco-global.properties in "ocr.extra.commands"
By launching the transformation from Alfresco Share (either by uploading a file with the transformation rule set on the folder) or by manually executing the transformation, the pdf file is not transformed and we encounter an error in catalina.out
Without adding the parameters in the section "ocr.extra.commands" the transformation works correctly, both from cmd and from Alfresco Share
By running the transformation from the command line (or with a bash script) as alfresco user, however, the OCR transformation is performed correctly and the additional parameters indicated in alfresco-global.properties, in "ocr.extra.commands", are correctly passed
Steps to reproduce the behavior
Modify alfresco-global.properties and add some extra parameters (for example "-rgb"), as indicated in the documentation (http://www.tobias-elze.de/pdfsandwich/)
After restarting Alfresco insert a pdf file for transformation
Wait for the execution of the job or manually launch the transformation
Verify that the file has not been transformed on Alfresco Share (it remains the same version)
The log (catalina.out) shows the error reported below ([...] Exception in thread "defaultAsyncAction1" java.lang.RuntimeException: java.lang.RuntimeException:[...])
Additional details (analysis so far, log statements, references, etc.)
Alfresco Community - 5.2.0
leptonica-1.74.4
pdfsandwich-0.1.6
tessdata-3.04.00
tesseract-3.05.00
unpaper-0.3-4
FEATURE / ENHANCEMENT
If you are requesting a feature or enhancement, please provide as much information as
possible and let us know how you will be able to contribute to resolving the request.
If you write code and can code up the solution, we welcome PRs. If you can do this but
would like guidance from the core team let us know.
Are you willing/able to test any work we do towards your request?
If you plan to contribute to the project and you are not familiar with our current
contribution policy, please make sure you have read that document (HINT: there is
a link at the top of the page when you are creating an issue.)
BUG
Alfresco OCR plugin based on pdfsanwhich returns error when performing OCR transformation from Alfresco Share with additional options (for example "-rgb" or "-resolution") added in alfresco-global.properties in "ocr.extra.commands".
Expected behavior
By adding extra parameters to the OCR configuration ( "ocr.extra.commands") in alfresco-global.properties, launching the transformation manually or activating the transformation rule, the pdf file must be correctly transformed using all the extra parameters added (for example with "-rgb" must also acquire the colors being transformed)
Actual behavior
Added extra parameters (for example "-rgb" or "-resolution") in alfresco-global.properties in "ocr.extra.commands"
By launching the transformation from Alfresco Share (either by uploading a file with the transformation rule set on the folder) or by manually executing the transformation, the pdf file is not transformed and we encounter an error in catalina.out Without adding the parameters in the section "ocr.extra.commands" the transformation works correctly, both from cmd and from Alfresco Share
By running the transformation from the command line (or with a bash script) as alfresco user, however, the OCR transformation is performed correctly and the additional parameters indicated in alfresco-global.properties, in "ocr.extra.commands", are correctly passed
Steps to reproduce the behavior
Additional details (analysis so far, log statements, references, etc.)
### Exception in thread "defaultAsyncAction1" java.lang.RuntimeException: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 08270022 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/pdfsandwich -rgb -verbose -lang spa+eng+fra /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_5058787576520103913.pdf -o /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_50587875765201 03913_ocr.pdf succeeded: false exit code: 2 out: pdfsandwich version 0.1.6 Checking for convert: convert -version Version: ImageMagick 7.0.5-2 Q16 x86_64 2017-04-04 http://www.imagemagick.org Copyright: © 1999-2017 ImageMagick Studio LLC License: http://www.imagemagick.org/script/license.php Featur err: pdfinfo version 0.26.5 Copyright 2005-2014 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC pdfunite version 0.26.5 Copyright 2005-2014 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996 at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:183) at es.keensoft.alfresco.ocr.OCRExtractAction.access$200(OCRExtractAction.java:38) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:164) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:161) at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464) at es.keensoft.alfresco.ocr.OCRExtractAction.executeInNewTransaction(OCRExtractAction.java:169) at es.keensoft.alfresco.ocr.OCRExtractAction.access$100(OCRExtractAction.java:38) at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask.run(OCRExtractAction.java:151) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 08270022 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/pdfsandwich -rgb -verbose -lang spa+eng+fra /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_5058787576520103913.pdf -o /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_5058787576520103913_ocr.pdf succeeded: false exit code: 2 out: pdfsandwich version 0.1.6 Checking for convert: convert -version Version: ImageMagick 7.0.5-2 Q16 x86_64 2017-04-04 http://www.imagemagick.org Copyright: © 1999-2017 ImageMagick Studio LLC License: http://www.imagemagick.org/script/license.php Featur err: pdfinfo version 0.26.5 Copyright 2005-2014 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC pdfunite version 0.26.5
Tell us about your environment
Alfresco Community - 5.2.0 leptonica-1.74.4 pdfsandwich-0.1.6 tessdata-3.04.00 tesseract-3.05.00 unpaper-0.3-4
FEATURE / ENHANCEMENT
If you are requesting a feature or enhancement, please provide as much information as possible and let us know how you will be able to contribute to resolving the request.
If you write code and can code up the solution, we welcome PRs. If you can do this but would like guidance from the core team let us know.
Are you willing/able to test any work we do towards your request?
If you plan to contribute to the project and you are not familiar with our current contribution policy, please make sure you have read that document (HINT: there is a link at the top of the page when you are creating an issue.)