janis91 / ocr

Nextcloud OCR (optical character recoginition) processing for images with tesseract-js
GNU Affero General Public License v3.0
107 stars 17 forks source link

[Feature Request] Automatic OCR on a specific folder #233

Closed Schleichmichl closed 4 years ago

Schleichmichl commented 4 years ago

As a user that has an external storage mounted where PDF files are added, (scanner memory-card mounted to a nextcloud user who shared this folder to everybody how uses the scanner) I would like to setup an automatic OCR job so that whenever a files is added (or at least periodically) the file is OCR'ed and replaced with the processed file.

backamblock commented 4 years ago

Yes please, this is a big +1

janis91 commented 4 years ago

This is not possible from this app's perspective. I know this would be a killer feature for some of you. But the app is strongly depending on a browser. The tesseract process, which does the ocr thing, is completely executed inside of the user's browser. That means you cannot simply execute it by the cron job of nextcloud for example. You could try to get around that by setting up a script that starts the ocr process with tesseract cli and re-scan for new index in nextcloud. but this is a completely different approach. Unfortunately, I have to close this feature request, as I cannot add it to the approach currently taken within this app. The app is not designed for large amounts of scans continously added to a folder. It is a small tool for doing things on-the-fly on user action.