ocrmypdf / OCRmyPDF-EasyOCR

OCRmyPDF EasyOCR plugin
MIT License
50 stars 9 forks source link

Acceleration of the launch #16

Closed Friskes closed 2 months ago

Friskes commented 2 months ago

I noticed that the creation of easyocr.Reader(languages, use_gpu) in def _ocr_process takes more than a second, at the time of starting recognition, it would be good to do this in advance if possible, so that at the time of the recognition request you do not waste an extra second

jbarlow83 commented 2 months ago

This step is done per worker process, so each process to create its own. After creating a Reader, the object is used for each page that worker processes, so for high page count documents we get optimal throughput. It's only if you have a lot of small documents that easyocr is less optimal.

This is the earliest it can be done without having some way to persist worker process across OCR jobs which would be a major overhaul of core ocrmypdf and breaking compatibility with all other plugins, because the current multiprocessing setup is that we create workers on demand for a specific task and then dispose of them.