Closed swaheera closed 3 years ago
You don't need to use pdf_convert
, you can also just pass the pdf file directly to tesseract_ocr()
. So maybe you can do:
files <- list.files("C:/Users/me/Documents/mypdfs", pattern = ".pdf$", full.names = TRUE)
out <- lapply(files, tesseract::ocr)
Thanks for your question. This tracker is for reporting bugs and issues with the R package. General programming questions on how to write loops are better suited for stack overflow.
I am trying to "mass upload" a large number of PDF files (these are "scanned pdf's") and convert them into text- currently, I figured out how to do this manually
I copied/pasted the above code 50 times (while changing the "index", i.e. pngfile_i, text_i) and was able to accomplish what I wanted to do.
However, I am looking for a somewhat "automatic" to import and convert all the pdf files.
At the moment, all my pdf files are in the following folder:
"C:/Users/me/Documents/mypdfs" I found the following code which can be used to "mass import" pdf files into R:
But I am not sure how to instruct this code to import all pdf's from the correct directory ("C:/Users/me/Documents/mypdfs"). I also don't know how to instruct R to "rename" each imported pdf as "pdf_1, pdf_2, etc."
If all the pdf files were correctly imported and created, I could then write a "loop" and execute the desired commands, e.g.
Can someone please show me how to do this?
Thanks