lucasrla / remarks

Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG
GNU General Public License v3.0
347 stars 20 forks source link

Add a flag to enable ocr my pdf #35

Closed sabidib closed 1 year ago

sabidib commented 3 years ago

Using ocrmypdf by default if it is found usually makes sense, however if trying to run remarks on a large collection of pdfs, it can take quite a while....

This PR adds the ability for the user to specific if they want to use the OCR functionality. It is set to false by default for quicker runs.

Turned on with --use-ocr/-ocr

An error is also thrown if the flag is set but the tool is not found.

@lucasrla for review


As an aside, I've tried to make this a bit faster by parallelizing calls to ocrmypdf but I ended up running into some segfaults. I'll revisit that at some point, but in case, the sane default should be set to false to encourage quicker runs (at the expense of some accuracy)...

lucasrla commented 1 year ago

You're right. I will resolve the conflicts and merge this PR in the upcoming days. Thanks!

lucasrla commented 1 year ago

I'd rather preserve the current behavior so I ended up changing the flag to become --avoid_ocr. Thanks again!