Closed tap90 closed 7 years ago
Maybe you can try the instruction by command line to debut the problem, as input file is still stored on your filesystem.
Sorry for the ignorance Which command should I try?
pdfsandwich /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572.pdf -o /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572_ocr.pdf -verbose -lang spa+eng+fra
This is the output:
pdfsandwich version 0.1.4 Checking for convert: convert -version Version: ImageMagick 6.7.8-9 2016-06-16 Q16 http://www.imagemagick.org Copyright: Copyright (C) 1999-2012 ImageMagick Studio LLC Features: OpenMP
Checking for unpaper: unpaper -version *** error: Unknown parameter '-version'. Try 'unpaper --help' for options. Checking for tesseract: tesseract -v tesseract 3.04.01 leptonica-1.72 libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7
Checking for gs: gs -v gs: symbol lookup error: /lib64/libgs.so.9: undefined symbol: cmsCreateContext Input file: "/opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572.pdf" Output file: "/opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572_ocr.pdf" gs: symbol lookup error: /lib64/libgs.so.9: undefined symbol: cmsCreateContext Fatal error: exception Failure("Error: Could not determine number of pages of file /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572.pdf")
I think the problem start with this line when pdfsandwich try to use gs -v command gs: symbol lookup error: /lib64/libgs.so.9: undefined symbol: cmsCreateContext
I have try to use this command and the output is the same
Ok, so you have to fix your local gs (GhostScript) installation before trying to run this Alfresco addon.
Yes, the problem is that ghostscript is installed as dependency of ImageMagick so I don't understand because it doesn't work
Different problems can be the cause, however, by executing gs -v you'd receive exactly the same error.
Maybe isolating the problem will help to solve it.
I have try to remove and install ghostscript now the command gs -v work correctly but when I execute pdfsandwich a new error is generated by ghostscript
pdfsandwich version 0.1.4 Checking for convert: convert -version Version: ImageMagick 6.7.8-9 2016-06-16 Q16 http://www.imagemagick.org Copyright: Copyright (C) 1999-2012 ImageMagick Studio LLC Features: OpenMP
Checking for unpaper: unpaper -version *** error: Unknown parameter '-version'. Try 'unpaper --help' for options. Checking for tesseract: tesseract -v tesseract 3.04.01 leptonica-1.72 libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7
Checking for gs: gs -v GPL Ghostscript 9.07 (2013-02-14) Copyright (C) 2012 Artifex Software, Inc. All rights reserved. Input file: "/home/ocrserver/prova.pdf" Output file: "/home/ocrserver/prova_ocr.pdf" GPL Ghostscript 9.07: Unrecoverable error, exit code 1 Fatal error: exception Failure("Error: Could not determine number of pages of file /home/ocrserver/prova.pdf")
This is the line:
GPL Ghostscript 9.07: Unrecoverable error, exit code 1 Fatal error: exception Failure("Error: Could not determine number of pages of file /home/ocrserver/prova.pdf")
Now I'm trying to install the latest version of pdfsandwich I let you know
With pdfsandwich works correctly It use pdfunite so you need to install this new dependency but It works
Hi, I'm unable to use below command directly from Terminal on Ubuntu for .tif (Multipages tif file) to .pdf file.
Can you please help on this?
$ /usr/bin/pdfsandwich -verbose -lang spa+eng+fra Sample_3_Multi_page.tif -o Sample_3_Multi_page.pdf pdfsandwich version 0.1.4 Checking for convert: convert -version Version: ImageMagick 6.8.9-9 Q16 x86_64 2018-07-10 http://www.imagemagick.org Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC Features: DPC Modules OpenMP Delegates: bzlib cairo djvu fftw fontconfig freetype jbig jng jpeg lcms lqr ltdl lzma openexr pangocairo png rsvg tiff wmf x xml zlib
Checking for unpaper: unpaper -version 6.1 Checking for tesseract: tesseract -v tesseract 3.04.01 leptonica-1.73 libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
Checking for gs: gs -v GPL Ghostscript 9.18 (2015-10-05) Copyright (C) 2015 Artifex Software, Inc. All rights reserved. Input file: "Sample_3_Multi_page.tif" Output file: "Sample_3_Multi_page.pdf" Fatal error: exception Failure("Error: Could not determine number of pages of file Sample_3_Multi_page.tif")
Thanks.
Dear @angelborroy-ks, I'm using ubuntu 16.04, i have copied the two jar and have installed pdfsandiwh 0.1.4,please can you help me for this :
pdfsandwich version 0.1.4 Checking for convert: convert -version Version: ImageMagick 6.8.9-9 Q16 x86_64 2019-11-12 http://www.imagemagick.org Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC Features: DPC Modules OpenMP Delegates: bzlib cairo djvu fftw fontconfig freetype jbig jng jpeg lcms lqr ltdl lzma openexr pangocairo png rsvg tiff wmf x xml zlib
Checking for unpaper: unpaper -version 6.1 Checking for tesseract: tesseract -v tesseract 3.04.01 leptonica-1.73 libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.2
Checking for gs: gs -v GPL Ghostscript 9.26 (2018-11-20) Copyright (C) 2018 Artifex Software, Inc. All rights reserved. Input file: "/opt/alfresco-community/tomcat/temp/Alfresco/alice.pdf" Output file: "/opt/alfresco-community/tomcat/temp/Alfresco/alice_ocr.pdf" Fatal error: exception Failure("Error: Could not determine number of pages of file /opt/alfresco-community/tomcat/temp/Alfresco/alice.pdf")
When I put some pdf files in the alfresco folder configured with the rule (ocr-extraction) Alfresco creates a new version of the file without perform ocr correctly.
When this happens It writes this in the alfresco.log: `Version: ImageMagick 6.9.1-10 Q16 x86_64 2015-08-12 http://www.imagemagick.org Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC License: http://www.imagemagick.org/script/license.php Features: Cipher DPC Modules Delegates (built-in): freetype jng jpeg ltdl png tiff wmf
Checking for unpaper: unpaper -version *** error: Unknown parameter '-version'. Try 'unpaper --help' for options. Checking for tesseract: tesseract -v Checking for gs: gs -v GPL Ghostscript 8.64 (2009-02-03) Copyright (C) 2009 Artifex Software, Inc. All rights reserved. Input file: "/opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572.pdf" Output file: "/opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572_ocr.pdf" Number of pages in inputfile: 1 Processing page 1. identify -format "%w\n%h\n" "/opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572.pdf[0]" convert -type Bilevel -density 300x300 "/opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572.pdf[0]" /tmp/pdfsandwichf66a6b.pbm unpaper --overwrite --no-grayfilter --layout none /tmp/pdfsandwichf66a6b.pbm /tmp/pdfsandwich5838df_unpaper.pbm Processing sheet: /tmp/pdfsandwichf66a6b.pbm -> /tmp/pdfsandwich5838df_unpaper.pbm tesseract /tmp/pdfsandwich5838df_unpaper.pbm /tmp/pdfsandwich0ca5f3 -l spa+eng+fra pdf gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dDEVICEWIDTHPOINTS=595 -dDEVICEHEIGHTPOINTS=842 -dPDFFitPage -o /tmp/pdfsandwich5264db.pdf /tmp/pdfsandwich0ca5f3.pdf OCR done. Writing "/opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572_ocr.pdf" gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="/opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_6425226260248108572_ocr.pdf" /tmp/pdfsandwich5264db.pdf
Done.
2017-01-30 15:01:17,837 INFO [es.keensoft.alfresco.ocr.OCRTransformWorker] [http-apr-8080-exec-9] STDERR: tesseract: /opt/alfresco-community/common/lib/libjpeg.so.62: no version information available (required by /usr/local/lib/liblept.so.4) tesseract: /opt/alfresco-community/common/lib/libjpeg.so.62: no version information available (required by /lib64/libtiff.so.5) tesseract 3.04.01 leptonica-1.72 libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.3
tesseract: /opt/alfresco-community/common/lib/libjpeg.so.62: no version information available (required by /usr/local/lib/liblept.so.4) tesseract: /opt/alfresco-community/common/lib/libjpeg.so.62: no version information available (required by /lib64/libtiff.so.5) Tesseract Open Source OCR Engine v3.04.01 with Leptonica`
I have noticed this error:
STDERR: tesseract: /opt/alfresco-community/common/lib/libjpeg.so.62: no version information available (required by /usr/local/lib/liblept.so.4)
Can it generate this problem? How can I fix this?
Thanks in advance