I am using tess4j in my project on file-based processing right now but I have a requirement to perform all OCR processing in-memory without having temp files on hard disk.
Is there a way to let Tesseract render PDF to ByteArrayOutputStream so we can write result somewhere after that without having it locally?
I did some intermediary processing in memory already but stuck with final OCR process to do it in memory because TessAPI expects some path to work with
It's only possible if Tesseract native library supports it. You may want to put in a feature request there. Once the feature is implemented and exposed in its CAPI interface, Tess4J can then be updated to use it.
I am using tess4j in my project on file-based processing right now but I have a requirement to perform all OCR processing in-memory without having temp files on hard disk.
Is there a way to let Tesseract render PDF to ByteArrayOutputStream so we can write result somewhere after that without having it locally?
I did some intermediary processing in memory already but stuck with final OCR process to do it in memory because TessAPI expects some path to work with