Closed ayushiagrahari closed 7 years ago
plz help me with this
Hi!, it looks like a pdfsandwich issue, what OS do you use? and what software versions?
Hello, I am using Ubuntu 16.04 pdfsandwich version 0.1.6
Tesseract 3.04 Leptonica 1.73 Unpaper 1.6
Is pdfsandwich
running from the pure command line? Without using Alfresco?
Since,the pdfsandwich expects only pdf files,so firstly I convert the tiff an jpeg files into pdf files ising the convert command and then run the pdfsandwich on it. But the transform method used in ExtractOCR.java is not able to transform the tiff and jpeg images into pdf images.So,actually the pdfsandwich is not working working properly on tiff and jpeg files
Yes, you need to use Alfresco Transformation action (from JPG to PDF) before using OCR action.
I am trying to perform OCR on tiff and jpeg files but showing "Couldn't find trailer dictionary","Couldn't read xref table"," exception Failure("Error: pdfinfo could not determine number of pages. Check the pdf input file.\n")" although the transformation from jpeg or tiff files to PDF files is working properly and the PDF file is visible on the alfresco share page