ocrmypdf / OCRmyPDF-EasyOCR

OCRmyPDF EasyOCR plugin
MIT License
34 stars 6 forks source link

Recursive daemon #9

Open deajan opened 3 months ago

deajan commented 3 months ago

Replace standard multiprocessing module with celery's billiard module, allowing to run OCRmyPDF-EasyOCR in a multiprocessing pool, just like celery does.

deajan commented 3 months ago

So far, I don't find any drawbacks for switching multiprocessing with billiard since it is supposed to be a drop-in replacement.

Celery specifically developped billiard in order to make multiprocessing using modules run under celery, which is the case with paperless-ngx.

jbarlow83 commented 3 months ago

I'm not so convinced it's a trivial change especially since ocrmypdf uses multiprocessing on its own, and there's very little documentation in billiard that outlines how it actually differs from multiprocessing.

deajan commented 2 months ago

AFAIK billiard's only difference being daemonic child process support, since EVERY python package using multiprocessing wouldn't work with Celery. Anyway, the replacement could become an optional parameter, something alike:

if os.environ.get("_OCRMYPDF_USE_BILLIARD", False) == True:
    import billiard as multiprocessing
else:
    import multiprocessing.managers

This way, OCRMyPDF-EasyOCR becomes Celery friendly without any major drawback. How about this ? I'll happily do the PR.