Open jlegewie opened 11 years ago
@jlegewie would you accept a pull request that achieves this?
Would be a great feature but let's say I am reluctant. First, it depends a little on the implementation. What are your thoughts about that? Second, I basically have no time for zotfile these days and in my experience significant new features create work and bugs down the line that I won't be able to fix. So it might be a better option to implement this as a separate plugin. But again, it would be good to hear about your thoughts on implementation first.
The separate plugin might be a smart idea. I guess it'd be simplest to add a right-click option (Scan for readable text
or something) that runs this library and overwrites the file.
I didn’t look in detail but that requires a binary. Zotete and zotfile both have code for downloading and updating binaries (zotfile’s is mostly copies from zotero). So it would probably be useful to build on that.
OCRmyPDF is very reliable and uses tesseract. It would be also great to include it as an automatic option when lookup of pdf metadata fails: "no ocr text found" -> run ocrmypdf automatically and rerun metadata lookup.
UB-Mannheim/zotero-ocr is a Zotero plugin to OCR pdfs using Tesseract.
(I have not used it myself, though.)
add menu item to "OCR PDF File" using free online OCR services or python library
e.g. http://free-online-ocr.com/ https://github.com/Pankrat/pdf-ocr-overlay