keensoft / alfresco-simple-ocr

Simple OCR action for Alfresco
Other
44 stars 30 forks source link

Spaces between characters in ocr'ed pdfs #45

Closed DavBE closed 7 years ago

DavBE commented 7 years ago

Hi,

Every PDF file I ocr in Alfresco contains spaces between each character.

Example : the word "client" becomes "c l i e n t".

Maybe it's a pdfsandwich issue but as it is called from alfresco-simple-ocr I though i would ask here. Is there anything I can do to solve this ?

System is Ubuntu 16.06 x64, running Alfresco 5.2.0 (re21f2be5-b22)

Regards,

David

angelborroy-ks commented 7 years ago

If you are using pdfsandwich switch to OCRmyPDF.

Try also to make the transformation from command line to detect if the problem is inside or outside Alfresco.

DavBE commented 7 years ago

Hi Angel,

Thanks for your reply.

I ran pdfsandwich on the command line and the issue persisted. I installed OCRmyPDF and no more spaces!

Thank you very much.

David