Closed opensemanticsearch closed 3 years ago
I close this issue, since Tika seems to convert/optimize the image(s) before OCR, so input and hash not the same for same embedded image files and the plugin is not default anymore, since our new default settings using Tikas OCR now with our new tesseract-ocr-cache.
Use same tesseract options like tika-server in enhance_pdf_ocr plugin so both can use same OCR cache results