ropensci / tesseract

Bindings to Tesseract OCR engine for R
https://docs.ropensci.org/tesseract
245 stars 26 forks source link

Add ocr'ed text back to image and generate a PDF #51

Open ramiromagno opened 4 years ago

ramiromagno commented 4 years ago

It would be great if this package supported adding back the retrieved text from a raster to PDF format.

For example, using tesseract directly from the command line makes this possible in one single command:

tesseract --dpi 600 --oem 2 input_01.png output_01 pdf
brshallo commented 2 years ago

Second this.

Relatedly, would be nice if could take in a .pdf that contains some images and convert these to editable text, returning a new .pdf like how done with adobe.