rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
316 stars 62 forks source link

allow `pin_upload` and `pin_download` to preserve directory structure #621

Closed christian-million closed 2 years ago

christian-million commented 2 years ago

I'd like to be able to pin a directory, and all of it's files and subdirectories recursively, while maintaining the directory structure when using pin_upload and pin_download. Currently, the behavior is to unpack all files in a directory and store them them all as individual files.

My use case is being able to pin an .rmd / .qmd document directory.

A typical .rmd document might live in the root of a project directory (maybe along side a .here / .Rproj file) and contain subdirectories (e.g., child/child.rmd, data/, data/clean/mtcars.csv, data/raw/mtcars-raw.csv). In the root .rmd file, there are usually references to the rest of the directory's contents relative to the root.

Using pin_download on a directory that was uploaded with pin_upload, there's currently no way to tell what the original directory structure was.

library(pins)

board <- board_temp()

target_dir <- fs::path_package("pins", "preview")

# 1 File and 2 Directories
fs::dir_ls(target_dir)
#> [my_path]/library/pins/preview/highlight.js-9.15.9
#> [my_path]/library/pins/preview/index.html
#> [my_path]/library/pins/preview/pagedtable-1.1

pin_name <- pin_upload(board, target_dir, "test-pin")
#> Creating new version '20220617T051839Z-220ef'

y <- pin_download(board, pin_name)

# 5 Files
y
#> [1] "[my_path]/AppData/Local/Temp/Rtmp8Qt0W4/pins-10945a6e71a0/test-pin/20220617T051839Z-220ef/highlight.js"       
#> [2] "[my_path]/AppData/Local/Temp/Rtmp8Qt0W4/pins-10945a6e71a0/test-pin/20220617T051839Z-220ef/qtcreator_light.css"
#> [3] "[my_path]/AppData/Local/Temp/Rtmp8Qt0W4/pins-10945a6e71a0/test-pin/20220617T051839Z-220ef/index.html"         
#> [4] "[my_path]/AppData/Local/Temp/Rtmp8Qt0W4/pins-10945a6e71a0/test-pin/20220617T051839Z-220ef/pagedtable.css"     
#> [5] "[my_path]/AppData/Local/Temp/Rtmp8Qt0W4/pins-10945a6e71a0/test-pin/20220617T051839Z-220ef/pagedtable.js"

Created on 2022-06-16 by the reprex package (v2.0.1)

christian-million commented 2 years ago

It might be enough to let the class methods for pin_store handle lines 51-57 in R/pin-upload-download.R, as opposed to pin_upload. This would allow others to implement a class/method that preserves subdirectory structure.

machow commented 2 years ago

I think right now, pin_store dispatches on board types (e.g. s3 board, folder board), rather than object or pin types, which might make extending it for this a bit funky.

I wonder if it'd help to add a "archive" or "gzip" type, that could zip and upload a folder? It seems like--as you point out--pin_upload is leaning on listing flat files in the pin metadata, but a <pin_name>/<version>/data.gzip that on pin_read also unpacks to say a <pin_name>/<version>/data, is another approach it could take..

juliasilge commented 2 years ago

I don't know if we can support this in pin_write() and pin_read() well (these are designed to write and read R objects rather than files/directories) but you can use our new "custom formats" vignette to see how to do this yourself. The idea is to tar up your directory and pin_upload() that, then on the other side pin_download() the tar file and unzip it:

library(pins)
library(archive)
b <- board_temp()

## what is in my Downloads folder?
zip_dir <- "~/Downloads"
fs::dir_ls(zip_dir)
#> /Users/juliasilge/Downloads/Linux-r-testthat-snapshots.zip
#> /Users/juliasilge/Downloads/horror-movie.png
#> /Users/juliasilge/Downloads/horror.png
#> /Users/juliasilge/Downloads/radio-stations.png
#> /Users/juliasilge/Downloads/rstudio-connect.13659.0.65371.run_app.HnQRzpB5ibxceDRn.log
#> /Users/juliasilge/Downloads/vetiver-illo.png
#> /Users/juliasilge/Downloads/vetiver.png

## archive directory and upload it
my_tar_file <- fs::file_temp(ext = "tar.gz")
archive_write_dir(my_tar_file, zip_dir)
pin_upload(b, paths = my_tar_file, name = "all-julias-downloads")
#> Creating new version '20221109T204754Z-41ac0'
b
#> Pin board <pins_board_folder>
#> Path: '/var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/RtmpqTxtWl/pins-10fcb4974e60b'
#> Cache size: 0
#> Pins [1]: 'all-julias-downloads'

## download archive and extract into new directory
new_dir <- fs::path_temp("downloads")
pin_download(b, "all-julias-downloads") %>%
  archive_extract(dir = new_dir)
fs::dir_ls(new_dir)
#> /var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/RtmpqTxtWl/downloads/Linux-r-testthat-snapshots.zip
#> /var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/RtmpqTxtWl/downloads/horror-movie.png
#> /var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/RtmpqTxtWl/downloads/horror.png
#> /var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/RtmpqTxtWl/downloads/radio-stations.png
#> /var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/RtmpqTxtWl/downloads/rstudio-connect.13659.0.65371.run_app.HnQRzpB5ibxceDRn.log
#> /var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/RtmpqTxtWl/downloads/vetiver-illo.png
#> /var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/RtmpqTxtWl/downloads/vetiver.png

Created on 2022-11-09 with reprex v2.0.2

I'm using the archive package here because it's a bit nicer/faster than base R's tar and untar.

What do you think about this approach? Maybe we should add file archives to the list of thing you "may want to read and write" in the new vignette.

christian-million commented 2 years ago

This solution will work for me. I've implemented a similar approach as the work-around (now permanent) solution for my needs. I've read through the new vignette and the "Function to manage uploading" section is helpful.

I'm not entirely sure how common of a use-case this is, but I think it could help folks to add a bullet to the list of examples. If it's not a common use-case, maybe it will serve as inspiration.

Thank you for considering this issue and for your response. I appreciate your work and especially the new vignette!

juliasilge commented 2 years ago

So glad that this works for your use case @christian-million! I think this is likely common enough that it's worth adding a little bit in the vignette, which I did in #676. You can see the updates rendered, if you like.

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.