keensoft / alfresco-simple-ocr

Simple OCR action for Alfresco
Other
44 stars 30 forks source link

simple OCR Integration With Alfresco on windows #34

Closed suneethababu closed 7 years ago

suneethababu commented 7 years ago

Hi, I am trying the integration of Simple OCR Add on with alfresco on windows desktop. I didn't find the way to install/configure the Windows.Media.OCR as local service. Please tell me how can I do that. Also my requirement is to configure the Add on with either windows 7 Pro or windows 2012 R2, may know whether it is supported with them.

angelborroy-ks commented 7 years ago

You should use https://github.com/Microsoft/Windows-universal-samples/tree/master/Samples/OCR as base to build your OCR local service. We are not providing this component.

Is possible to adapt Windows.Media.OCR to run in Windows 8 or Windows 2012 R2 but I guess it will not work with Windows 7.

suneethababu commented 7 years ago

Thanks for the response AngelBorroy. I I will check thate, could you please tell me whether the AddOn works with OCRMyPDF with Windows and Alfresco Enterprise.

angelborroy-ks commented 7 years ago

Probably you can try OCRmyPDF in Windows by using a VirtualBox machine or a Docker container: http://ocrmypdf.readthedocs.io/en/latest/installation.html#installing-on-windows

It should work with Alfresco Enterprise.

suneethababu commented 7 years ago

I am trying to use tesseract windows version with the Addon, When I imported pdf image I got the error "Error in pixReadStream: Pdf reading is not supported", With jpg import the error "No transformation exists between mimetypes image/jpeg and application/pdf" occured. Could you please advice on how to resolve this.

angelborroy-ks commented 7 years ago

We have not test this use case (Windows Tesseract).

Alfresco provides following Transformation pipeline:

JPEG > TIFF > PDF

Probably you have to tweak your JPEG image before trying to use it with Tesseract.

suneethababu commented 7 years ago

thanks for response. Even with tiff image i got No transformation error.

Regarding pdf image, does it work with pdfsandwich on Linux p pdf image to pdf conversion?

angelborroy-ks commented 7 years ago

Using pdfsandwich or ocrmypdf in Linux (which is the same scenario as using Docker in Windows) you can use directly in Alfresco Share:

To OCR jpeg files a simple rule in the folder has to be added, but they are supported as well.