ropensci / pdftools

Text Extraction, Rendering and Converting of PDF Documents
https://docs.ropensci.org/pdftools
Other
513 stars 69 forks source link

Raw single-page attachments from `pdf_attachments` cannot be converted by `pdf_convert` (no `basename`) #93

Closed Baurice closed 3 years ago

Baurice commented 3 years ago

Due to the code in pdf_convert (below), raw single-page attachments from pdf_attachments cannot be converted, since they do not have a basename; documents with 2+ pages convert without issue:

if (length(filenames) < 2) {
        input <- sub(".pdf", "", basename(pdf), fixed = TRUE)
        filenames <- if (length(filenames)) {
            sprintf(filenames, pages, format)
        }

Reproducible example:

library(pdftools)
#> Using poppler version 21.02.0
link <- c("http://www.accessdata.fda.gov/cdrh_docs/pdf19/K190072.pdf")
lapply(pdf_attachments(link), function(x) pdf_convert(x$data, 
    filenames=paste0(tools::file_path_sans_ext(x$name), "-", 
                     seq_along(pdf_data(x$data)), ".png")))
#> Converting page 1 to K190072.510kSummary.Final_Sent001-1.png... done!
#> Converting page 2 to K190072.510kSummary.Final_Sent001-2.png... done!
#> Converting page 3 to K190072.510kSummary.Final_Sent001-3.png... done!
#> Converting page 4 to K190072.510kSummary.Final_Sent001-4.png... done!
#> Converting page 5 to K190072.510kSummary.Final_Sent001-5.png... done!
#> Error in basename(pdf): a character vector argument expected

Created on 2021-05-05 by the reprex package (v2.0.0)

For example changing the condition to if (length(filenames) < 2 & !is.raw(pdf)) {...} fixes this issue and allows the code to run successfully.

This should be independent of the R version, based on the above code.

> Using poppler version 21.02.0

> R Under development (unstable) (2021-05-03 r80259)

> Platform: x86_64-pc-linux-gnu (64-bit)

> Running under: Ubuntu 21.04

pdftools_3.0.0

jeroen commented 3 years ago

Thanks