rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

Use case for enabling cache inside board_folder #669

Closed harell closed 1 year ago

harell commented 1 year ago

Hi

Thank you for the useful work on pins

In many organisations, there is a network drive (or NAS) mapped to an employee's computer as a local drive. So while the folder path appears to be on a local drive, the data is stored remotely. During code execution, the data moves through the organisation network and sometimes a VPN. Needless to say, that data movement slows down the work.

This is where exposing the cache inside board_folder can be useful. By turning cache on, the data from the network drive is stored locally while exploiting the benefits of the caching mechanism.

Would you consider exposing cache in board_folder?

Cheers

juliasilge commented 1 year ago

Thanks for this feedback! 🙌 I can see how caching for board_folder() would be useful in some situations. It may also be surprising to folks who are using such a board on a truly local filesystem, and we don't have a very obvious way to turn caching on/off currently so this will take some thought.

Let's use this issue to collect input on this for prioritization.

juliasilge commented 1 year ago

In the meantime you could work around this by overwriting board_folder() for your local work (like defining this function in a script that is going very slow for you, or in an internal package in your org):

board_folder <- function(path, versioned = FALSE, cache = NULL) {
  fs::dir_create(path)
  path <- fs::path_norm(path)
  cache <- cache %||% pins:::board_cache_path(paste0("folder", path))

  pins:::new_board_v1("pins_board_folder",
    cache = cache,
    path = path,
    versioned = versioned
  )
}

I think that should just work with the rest of pins.

harell commented 1 year ago

Thanks @juliasilge.

harell commented 1 year ago

For people looking at this thread, I ended up creating a new board using the following code:

board_network_drive <- function(path, versioned = FALSE, cache = NULL) {
    `%||%` <- function (x, y) { if (is_null(x)) y else x }
    remove_drive_name <- function(path) paste(fs::path_split(path)[[1]][-1], collapse = "/")
    board_cache_path <- eval(parse(text = "pins:::board_cache_path"))
    new_board_v1 <- eval(parse(text = "pins:::new_board_v1"))

    fs::dir_create(path)
    path <- fs::path_norm(path)
    cache <- cache %||% board_cache_path(fs::path("network_drive", remove_drive_name(path)))

    new_board_v1("pins_board_folder",
                 cache = cache,
                 path = path,
                 versioned = versioned
    )
}

@juliasilge Notice the need in remove_drive_name to remove the drive name (e.g. "C:") as Windows does not allow naming folders or files with colon (":")

juliasilge commented 1 year ago

For now, we're going to pass on making a change in pins itself to use caching for board_folder() but the workaround here is a good option for folks in a similar situation. 👍

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.