rstudio / plumber

Turn your R code into a web API.
https://www.rplumber.io
Other
1.39k stars 256 forks source link

Examples of the use of file parser functions #681

Open th72 opened 3 years ago

th72 commented 3 years ago

Hi,

Are there somewhere examples of the parser_csv(...) or parser_read_file(read_fn = readLines) functions. How do I use them in a "Plumber.R" file?

I tried something like this:

* @post /upload

* @parse csv

function(req, res, file) { list( body = req$body, raw = req$bodyRaw ) }

In combination with a form:

<form action="http://127.0.0.1:4920/upload"> <label for="myfile">Select a file:</label> <input type="file" id="myfile" name="myfile"><br><br> <input type="submit"> </form>

But I only get a 405 error....

Thanks!

meztez commented 3 years ago
#* @post /upload
#* @parser multi
#* @parser csv
#* @param f:file
function(f) {
  #Filename
  names(f)
  #Content
  f[[1]]
}
meztez commented 3 years ago

I think Microsoft Edge, when using Swagger, sets the content-type to application/vnd.ms-excel which plumber does not recognize as a csv file.

I will investigate.

meztez commented 3 years ago

It seems to be Microsoft Excel and not Edge related, I was able to force parser by re-registering it before defining the endpoint.

Why would MS force that excel file type on csv is beyond me. https://github.com/mholt/PapaParse/issues/18 https://github.com/react-dropzone/react-dropzone/issues/276

#* @plumber
function(pr) {
  register_parser("csv",
                  parser_csv,
                  fixed = c("application/csv",
                            "application/x-csv",
                            "text/csv",
                            "text/x-csv",
                            "application/vnd.ms-excel"))
}

#* @post /upload
#* @parser multi
#* @parser csv
#* @param f:file
function(f) {
  #Filename
  names(f)
  #Content
  f[[1]]
}
th72 commented 3 years ago

csv files on windows are standaard openend with Excel.... de files have even the Excel logo....

Is it also possible to use parser_read_file(read_fn = readLines) and in stead off "readLines" use the readxl package read_excel function? The parser_read_file is not registered and I need too override the readLines function some way.....

meztez commented 3 years ago

You can do all that.

See ?register_parser or https://www.rplumber.io/reference/register_parser.html

To learn how to define your own parser.

See my code above on how you would do it.

Any alias you will define will be available via @parser {alias}

Maybe something like

#* @plumber
function(pr) {
  register_parser("csv_windows",
                  parser_read_file(read_fn = readxl::read_excel),
                  fixed = "application/vnd.ms-excel")
}

#* @post /upload
#* @parser multi
#* @parser csv_windows
#* @param f:file
function(f) {
  #Filename
  names(f)
  #Content
  f[[1]]
}

I'll test it and update code.

meztez commented 3 years ago

Well read_excel does not want to read csv file it seems

library(plumber)
register_parser("csv_windows",
                function(...) {parser_read_file(read_fn = readxl::read_xls)},
                fixed = "application/vnd.ms-excel")

#* @post /upload
#* @parser multi
#* @parser csv_windows
#* @param f:file
function(f) {
  browser()
  #Filename
  names(f)
  #Content
  f[[1]]
}

<Rcpp::exception: 
  filepath: C:\Users\gen01914\AppData\Local\Temp\Rtmp2Jpza8\plumb2be437a833ae_mtcars.csv
  libxls error: Unable to open file>
meztez commented 3 years ago

@th72 What about:

library(plumber)
register_parser("csv_windows",
                parser_csv,
                fixed = "application/vnd.ms-excel")

#* @post /upload
#* @parser multi
#* @parser csv_windows
#* @param f:file
function(f) {
  f
}
meztez commented 3 years ago

csv files on windows are standaard openend with Excel.... de files have even the Excel logo....

Is it also possible to use parser_read_file(read_fn = readLines) and in stead off "readLines" use the readxl package read_excel function? The parser_read_file is not registered and I need too override the readLines function some way.....

read_excel from readxl does not read csv from what I can gather.

But you could send an xlsx file and use similar code.

th72 commented 3 years ago

Thanks a lot Bruno! This gives me some ideas. I will try it out.

th72 commented 3 years ago

It worked..... almost....

I tried to read an xlsx file.... there are even more Excel application types :-)

This one seems to work for me: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"

This reads now the first Excel sheet. I will add a variable for the sheet number and a variable for the region to read on that sheet....

register_parser("excel_windows", function(...) {parser_read_file(read_fn = readxl::read_excel)}, fixed = c("application/vnd.ms-excel", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"))

#* @post /upload_excel #* @parser multi #* @parser excel_windows #* @param f:file function(f) { #Filename names(f) browser() #Content f[[1]] }

meztez commented 3 years ago

What about?

library(plumber)
register_parser("excel_windows",
  function(...) {
    parser_read_file(function(tmpfile) {
      readxl::read_excel(tmpfile, ...)
    })
  },
  fixed = c(
    "application/vnd.ms-excel",
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  )
)

#* @post /upload_excel
#* @parser multi
#* @parser excel_windows list(sheet = "GH-Actions", range = "A1:F15")
#* @param f:file
function(f) {
   f
}

Tested, this works. Modify "GH-Actions" and range to match your own. Note that this will always parse to same sheet and range for every file sent to this endpoint.

th72 commented 3 years ago

Yes Bruno, something like that.... Thanks... It works like a charme...

* @parser excel_windows list(sheet = "GH-Actions", range = "A1:F15")

Didn't know that something like this was possible.

Is it also possible to have "sheet" and "range" as variables in the post request or is that too complicated.

meztez commented 3 years ago

You can do your own parsing inside the plumber expression itself. That should be more stable and also the recommanded way.

#* @post /upload_excel
#* @param f:file
#* @param sheet:str
#* @param range:str
function(f, sheet, range) {

  tmp <- tempfile("plumb", fileext = paste0("_", basename(names(f))))
  on.exit(unlink(tmp))
  writeBin(f[[1]], tmp)
  t <- readxl::read_excel(tmp, sheet, range)
  nrow(t)
}

Here is an ugly hack that I do not recommand as I don't know how stable that would be.

library(plumber)
register_parser("excel_windows",
  function(...) {
    parser_read_file(function(tmpfile) {
      args <- get("req", envir = parent.frame(n = 9))$args
      do.call(readxl::read_excel, list(path = tmpfile, sheet = args$sheet, range = args$range, ...))
    })
  },
  fixed = c(
    "application/vnd.ms-excel",
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  )
)

#* @post /upload_excel
#* @parser multi
#* @parser excel_windows
#* @param f:file
#* @param sheet:str
#* @param range:str
function(f) {
  t <- tibble::as_tibble(f[[1]])
  nrow(t)
}

It works now because queryStringFilter filter is executed before bodyFilter but you should not rely on that. Also, the hack to retrieve req from a higher parent frame has absolutely no guarantee to work in the future.

But it works...

th72 commented 3 years ago

@meztez I will go for your first solution. Thanks.