Would you mind adding (or receiving a PR) adding a "related packages" section to the README (and therefore pkgdown website)?
It could mention
tesseract by @jeroen for more direct control of the OCR process.
pdftools for extracting metadata and text from PDF files (therefore more specific to PDF, and without a Java dependency)
tabulizer by @leeper and @tpaskhalis, Bindings for Tabula PDF Table Extractor Library, to extract tables, therefore not text, from PDF files.
epubr by @leonawicz (about to be onboarded) which is more specific than rtika and therefore gives an user-friendlier output when parsing epub files.
I think it'd help people choose a package suitable to their needs. rtika is a powerful package for many formats, which I hope you don't think I'm forgetting. 😺
Would you mind adding (or receiving a PR) adding a "related packages" section to the README (and therefore
pkgdown
website)?It could mention
tesseract
by @jeroen for more direct control of the OCR process.pdftools
for extracting metadata and text from PDF files (therefore more specific to PDF, and without a Java dependency)tabulizer
by @leeper and @tpaskhalis, Bindings for Tabula PDF Table Extractor Library, to extract tables, therefore not text, from PDF files.epubr
by @leonawicz (about to be onboarded) which is more specific thanrtika
and therefore gives an user-friendlier output when parsing epub files.I think it'd help people choose a package suitable to their needs.
rtika
is a powerful package for many formats, which I hope you don't think I'm forgetting. 😺Thanks in advance!