nguyenq / tess4j

Java JNA wrapper for Tesseract OCR API
Apache License 2.0
1.58k stars 372 forks source link

Do all processing in-memory #227

Closed yshyman closed 2 years ago

yshyman commented 2 years ago

I am using tess4j in my project on file-based processing right now but I have a requirement to perform all OCR processing in-memory without having temp files on hard disk.

Is there a way to let Tesseract render PDF to ByteArrayOutputStream so we can write result somewhere after that without having it locally?

I did some intermediary processing in memory already but stuck with final OCR process to do it in memory because TessAPI expects some path to work with

nguyenq commented 2 years ago

It's only possible if Tesseract native library supports it. You may want to put in a feature request there. Once the feature is implemented and exposed in its CAPI interface, Tess4J can then be updated to use it.