Closed andresmorago closed 3 years ago
What type is data
in your case? I think your best solution is to writeBin()
the data to a tempfile(ext = '.pdf')
and read that.
i was hoping to omit the additional write to folder between the SFTP get and the PDF read since i plan to read multiple files from an external server
data
comes from RCurl functions:
protocol <- "sftp"
server <- "172.16.19.9"
port = "63636"
userpwd <- "aaa:bbb.*"
tsfrFilename <- "560782-592321-90694_5-5.pdf"
url <- paste0(protocol, "://", server,":", port, "/", tsfrFilename)
data <- getBinaryURL (url = url, userpwd=userpwd
)
You can try this:
read_raw_pdf <- function(data){
con <- rawConnection(data)
on.exit(close(con))
magick::image_read(pdftools::pdf_render_page(con))
}
thanks so much!! it works!
hello again.
i have two extra questions:
1) im trying to improve the quality of the scanned file but no luck so far. i included the density
flag but it is not being considered
read_raw_pdf <- function(data){
con <- rawConnection(data)
on.exit(close(con))
magick::image_read(pdftools::pdf_render_page(con), density =250)
}
raw_pdf1 = read_raw_pdf(data1)
2) how can i read multiple pages on the sameread_raw_pdf
file?
ive got the quality to work. but im having issues when loading pdf with multiple pages
read_raw_pdf <- function(data){
con <- rawConnection(data)
on.exit(close(con))
magick::image_read(pdftools::pdf_render_page(con, dpi = 250))
}
The pdf_render_page()
function has a second argument to specify which page you want to render and you can use pdf_info
to figure out how many pages there are.
Have a look at the source code for magick::image_read_pdf
Hello
Is there a chance the function
image_read_pdf
could read data from a global environment variable instead that from a url or external path?I have read a pdf from a SFTP server and stored it as binary in a variable and this will throw an error
image1 = image_read_pdf (data, density=200)