ropensci / magick

Magic, madness, heaven, sin
https://docs.ropensci.org/magick
Other
462 stars 65 forks source link

reading pdf from R global environment and not an external url #307

Closed andresmorago closed 3 years ago

andresmorago commented 3 years ago

Hello

Is there a chance the function image_read_pdf could read data from a global environment variable instead that from a url or external path?

I have read a pdf from a SFTP server and stored it as binary in a variable and this will throw an error

image1 = image_read_pdf (data, density=200)

jeroen commented 3 years ago

What type is data in your case? I think your best solution is to writeBin() the data to a tempfile(ext = '.pdf') and read that.

andresmorago commented 3 years ago

i was hoping to omit the additional write to folder between the SFTP get and the PDF read since i plan to read multiple files from an external server

data comes from RCurl functions:

protocol <- "sftp"
server <- "172.16.19.9"
port = "63636"
userpwd <- "aaa:bbb.*"
tsfrFilename <- "560782-592321-90694_5-5.pdf"

url <- paste0(protocol, "://", server,":", port, "/", tsfrFilename)
data <- getBinaryURL (url = url, userpwd=userpwd
)
jeroen commented 3 years ago

You can try this:

read_raw_pdf <- function(data){
  con <- rawConnection(data)
  on.exit(close(con))
  magick::image_read(pdftools::pdf_render_page(con))
}
andresmorago commented 3 years ago

thanks so much!! it works!

andresmorago commented 3 years ago

hello again.

i have two extra questions:

1) im trying to improve the quality of the scanned file but no luck so far. i included the densityflag but it is not being considered

read_raw_pdf <- function(data){
  con <- rawConnection(data)
  on.exit(close(con))
  magick::image_read(pdftools::pdf_render_page(con), density =250)
}

raw_pdf1 = read_raw_pdf(data1)

2) how can i read multiple pages on the sameread_raw_pdffile?

andresmorago commented 3 years ago

ive got the quality to work. but im having issues when loading pdf with multiple pages

read_raw_pdf <- function(data){
  con <- rawConnection(data)
  on.exit(close(con))
  magick::image_read(pdftools::pdf_render_page(con, dpi = 250))
  }
jeroen commented 3 years ago

The pdf_render_page() function has a second argument to specify which page you want to render and you can use pdf_info to figure out how many pages there are.

Have a look at the source code for magick::image_read_pdf