ropensci / pdftools

Text Extraction, Rendering and Converting of PDF Documents
https://docs.ropensci.org/pdftools
Other
513 stars 69 forks source link

add ability to pass options to the tesseract function #125

Closed nriemenschneider closed 1 year ago

nriemenschneider commented 1 year ago

I included the possibility to pass options to the tesseract function. As discussed in issue #121 , this is important because it is impossible to import pdf documents with multiple columns using the pdf_ocr_text() function at the moment. The pdf_ocr_text_2() function by default assumes the pdf to have multiple columns, but works perfectly with one column as well.

jeroen commented 1 year ago

Thanks. I have committed a simpler version that only passes down the options parameter.