Add a flag to enable ocr my pdf

sabidib commented 3 years ago

Using ocrmypdf by default if it is found usually makes sense, however if trying to run remarks on a large collection of pdfs, it can take quite a while....

This PR adds the ability for the user to specific if they want to use the OCR functionality. It is set to false by default for quicker runs.

Turned on with --use-ocr/-ocr

An error is also thrown if the flag is set but the tool is not found.

@lucasrla for review

As an aside, I've tried to make this a bit faster by parallelizing calls to ocrmypdf but I ended up running into some segfaults. I'll revisit that at some point, but in case, the sane default should be set to false to encourage quicker runs (at the expense of some accuracy)...

lucasrla commented 1 year ago

You're right. I will resolve the conflicts and merge this PR in the upcoming days. Thanks!

lucasrla commented 1 year ago

I'd rather preserve the current behavior so I ended up changing the flag to become --avoid_ocr. Thanks again!

lucasrla / remarks

Add a flag to enable ocr my pdf #35