rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

add custom type for `pin_write()` #649

Closed ijlyttle closed 2 years ago

ijlyttle commented 2 years ago

fix #631

Sorry for delay, I finally got enough things pushed to the side of my "plate" so that I can devote some time to this - this draft is just to indicate I have not forgotten this or #627 .

ijlyttle commented 2 years ago

I played around with a few ideas today, hoping each time to arrive at something simpler.

I still need to add tests, and more-complete documentation, but I wanted to run this proposed approach past you:

I am impressed with what you can do using rlang::call_modify(), but (at least to my eye) a "plain" function seems more familiar. You could call using:

It has to be a function that takes the object first, then the path second.

The larger question seems to be the interaction between type and .f; I propose these possibilities for each:

type:

.f:

Here's what I propose for behavior:

type .f behavior
NULL NULL no change, use guess_type()
NULL function use guess_type(), inform that custom function is used, write using custom function
standard NULL no change, write using object_write()
standard function inform that custom function is used, write using custom function
custom NULL abort in object_write()
custom function write using custom function (don't inform)

Notes:


I'll start working on tests, etc., but wanted to give you an early chance to react. Thanks!

juliasilge commented 2 years ago

I think this is looking promising, but that table you made had me realize that we are probably bumping up against argument dependencies/interdependence. This has absolutely been a headache for me before 😩 so we definitely want to avoid this pattern if possible.

What do you think about adding support for passing a function to type, for folks to do something more custom? This is how unnest_tokens() works and it has been pretty decent for maintenance and flexibility. (Take a look at the find_function() function.)

ijlyttle commented 2 years ago

I am with you. I now appreciate the danger of argument dependencies, thanks for helping me learn new things!

I will be happy to revise things to use the type argument. There are a couple things I would like to think about, in the fullness of time:

ijlyttle commented 2 years ago

Pondering this a bit more, there already exists the "file" type used with pin_upload() and pin_download().

Overloading the type argument could provide an easier way to use pin_upload(), but instead of modifying pin_read(), one could:

very_important_data <- 
  pin_download(my_board, "my-custom-pin") |>
  custom_reader_function()
ijlyttle commented 2 years ago

I have run into another problem: how to specify the file extension. In pin_write(), this is handled by the type; if type is a function, we can't deduce the extension.

One possible solution might be to offer a helper function, pin_file_writer():

#' @param f `function` that takes the object and a path, then writes the object to the path.
#' @param extension `character` extension for the file to be written.
#'
pin_file_writer <- function(.f, extension) {

  .f <- rlang::as_function(f)

  function(x, name = NULL) {

    # determine name, if needed
    if (is.null(name)) {
      name <- enexpr(x)
      if (is_symbol(name)) {
        name <- as.character(name)
        pins_inform("Using `name = '{name}'`")
      } else {
        abort("Must supply `name` when `x` is an expression")
      }
    }

    # determine path for file
    path <- fs::path_temp(fs::path_ext_set(fs::path_file(name), extension))

    # write object to file
    .f(x, path)

    # delete file when environment that called writer is destroyed
    withr::defer(fs::file_delete(path), envir = parent.frame())

    invisible(path)
  }
}
arrow_pin_writer <- pin_file_writer(
  \(x, path) arrow::write_feather(x, path, compression = "uncompressed"), 
  extension = "arrow"
)

pin_write(my_board, x = mtcars, type = arrow_pin_writer)

Alternatively (if I'm understanding the code correctly):

pin_upload(my_board, paths = arrow_pin_writer(mtcars))

If using pin_upload() works, would it be necessary to mess with pin_write()? It may be possible to get everything done using pin_upload() and pin_download().

I am sorry that I have turned this into a ride on the "API merry-go-round". While I enjoy an API discussion as much as (perhaps more than) the next person, I know you all have other things to do.

I'll start coding in this direction unless or until I hear otherwise :)

ijlyttle commented 2 years ago

I think I have something minimally working; here's what I get:

library("pins")

arrow_pin_writer <- pin_file_writer(
  ~arrow::write_feather(.x, .y, compression = "uncompressed"),
  extension = "arrow"
)

board <- board_temp()

pin_write(board, mtcars, type = arrow_pin_writer)
#> Using `name = 'mtcars'`
#> Creating new version '20220914T202009Z-8a125'
#> Writing to pin 'mtcars'

pin_upload(board, arrow_pin_writer(mtcars))
#> Guessing `name = 'mtcars.arrow'`
#> Creating new version '20220914T202009Z-8a125'

pin_download(board, "mtcars") |> arrow::read_feather() |> head()
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

pin_download(board, "mtcars.arrow") |> arrow::read_feather() |> head()
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Created on 2022-09-14 by the reprex package (v2.0.1)

One thing that does not appear in the reprex, but does appear interactively:

pin_upload(board, arrow_pin_writer(mtcars))
Setting global deferred event(s).
i These will be run:
  * Automatically, when the R session ends.
  * On demand, if you call `withr::deferred_run()`.
i Use `withr::deferred_clear()` to clear them without executing.
Guessing `name = 'mtcars.arrow'`
Creating new version '20220914T214753Z-8a125'

I think this is happening because withr::defer() is using the global environment. I don't know if this is a problem or not, but I could note it in the function documentation.

Assuming things are OK, some questions:

Thanks!

juliasilge commented 2 years ago

So.... I am not feeling great about the pin_file_writer() approach. 😔 I think that is going to be confusing for many users. Maybe the nice interface of pin_write() isn't easily extensible to this kind of custom writing, because of needing to know how to write, plus what to write (i.e. the extension), etc.

HOWEVER, I am really excited about making this work better via pin_upload()! 🤩 This may be the right place to provide more functionality/documentation for folks who want to pin using more custom options.

This fills the need, if I am understanding correctly?

library(pins)

b <- board_temp()
pin_name <- "my-mtcars"
path <- fs::path_temp(fs::path_ext_set(pin_name, "arrow"))
arrow::write_feather(mtcars, path, compression = "uncompressed")
b %>% pin_upload(path, pin_name, title = "My very excellent cars")
#> Creating new version '20220921T174004Z-8a125'

Created on 2022-09-21 with reprex v2.0.2

What do you think we could do to make this a nicer user experience?

ijlyttle commented 2 years ago

I think the pin_upload() approach is the simplest way to go. As you point out in the example, which does fill the need, I think it becomes a matter of documentation, rather than adding a new function. That being the case, I'm thinking to close this PR, then create another PR for documentation.

juliasilge commented 2 years ago

That would be perfect @ijlyttle! Thank you for being willing to work through this in the way we did.

github-actions[bot] commented 1 year ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.