loftuxab / alfresco-ubuntu-install

Alfresco script based install for Ubuntu
https://loftux.com/products-and-add-ons/alfresco-utilities
166 stars 123 forks source link

Add script to OCR the pages of any PDF that has no text layer #71

Closed chris001 closed 9 years ago

chris001 commented 9 years ago

Add tesseract-ocr, leptonica, ephesoft, pdfkit, pdftk, hocr2pdf, pdfjoin, cron job, for automatic OCR on a cron job schedule, to OCR all PDF documents that contain TIFF image and no text layer.

This creates a searchable PDFs from scanned documents, screenshots, images, etc.

https://www.appnovation.com/blog/creating-searchable-pdf-alfresco

https://www.surevine.com/a-little-alfresco-tesseract-ocr-integration/

https://tpeelen.wordpress.com/2010/12/17/alfresco-using-tesseract-ocr-on-ubuntu-linux/

loftux commented 9 years ago

Thanks for the suggestion. We would however try to keep the installer as simple as possible and not having to step through to many options in order to complete an install.

Instead of adding it to the core script, you can create an additional "SetupAddons.sh" script that you can run to add additional features. I'll add it when I get a pull request.