rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

board_url() can't download pins from boards created with board_folder() #627

Closed innir closed 1 year ago

innir commented 2 years ago

Hi,

looking at the the board_url() documentation

A named character vector of URLs If the URL ends in a /, board_url will look for a data.txt that provides metadata. The easiest way to generate this file is to upload a pin directory created by board_folder().

it seems to be easy to create a board locally and write a pin to it with

> library(pins)
> board <- board_folder("test")
> board %>% pin_write(mtcars)
Using `name = 'mtcars'`
Guessing `type = 'rds'`
Creating new version '20220712T143218Z-66143'
Writing to pin 'mtcars'
> board$versioned
[1] FALSE
> fs::dir_tree("test")
test
└── mtcars
    └── 20220712T143218Z-66143
        ├── data.txt
        └── mtcars.rds

Now one should be able to just copy the folder test to some http server and get the pin mtcars back like

board <- board_url(c(mtcars = "https://example.com/test/mtcars/"))
board %>% pin_read("mtcars")

but the directory tree and the code at https://github.com/rstudio/pins/blob/main/R/board_url.R#L84-L90 makes it obvious that this will (and does) not work. The code expects data.txt at https://example.com/test/mtcars/data.txt while it is at https://example.com/test/mtcars/20220712T143218Z-66143/data.txt.

machow commented 2 years ago

Hey--I wonder if this could be clarified in the documentation, but from what I can tell this phrase...

upload a pin directory created by board_folder(). [emph added]

Should not be referring to the board itself, or even a pin directory, but a pin version directory. If you were to paste that version folder ('20220712T143218Z-66143') somewhere, then you could use board_url to refer to it.

I'm not sure there's a simple way to support broader board functionality using board_url, without a way to list folders (which is not impossible with HTTP, but also dependent on HTTP servers doing things in a consistent way.)

Could you say more about the case you're trying to use it for?

innir commented 2 years ago

True, one could use the pin version but that would draw it rather useless, as every update of the pin would force the reader of that pin to change the name :-/ For me the question it more, why does board_folder("test", versioned = FALSE) add a versioned directory?

My use-case is basically a poor-mans api ... I frequently update pins on a server and want to consume the data on clients ... seemed to be a very easy solution involving just a http server serving static files ... (and some cron jon updating the pins)

ijlyttle commented 2 years ago

I came here on the same path as @innir: I want to create using board_folder(), then serve using board_url() with a predictable, unchanging URL.

When I create a board_folder(), versioned = FALSE is the default (as you know). To me, this suggests that the data.txt and the supporting data files would be stored in the directory named for the pin, rather than in a subdirectory named for a version.

In essence, like the way the examples are set up for board_url():

github_raw <- "https://raw.githubusercontent.com/"
board <- board_url(c(
  files = paste0(github_raw, "rstudio/pins/master/tests/testthat/pin-files/"),
  rds = paste0(github_raw, "rstudio/pins/master/tests/testthat/pin-rds/"),
  raw = paste0(github_raw, "rstudio/pins/master/tests/testthat/pin-files/first.txt")
))

Thanks!

ijlyttle commented 2 years ago

This may be a bad (and unrelated) idea, but could board_url() support a manifest file?

That is, if you intend to serve a board from a web-server, you could call a function to generate a file in its root called something like manifest.json (or whatever), which would contain the names of all the pins and versions. Then, if you called board_url() ~without arguments~ with a single unnamed argument: the base url, it would look for this manifest file to build the board?

¯\_(ツ)_/¯

ijlyttle commented 2 years ago

I have made a proof-of-concept: https://ijlyttle.github.io/pinsManifest/

This implements the manifest idea, using a file called pins.txt. It can create a manifest only for board_folder() and can be used to create (read from) a board_url().

I wrote this package in hopes that the ideas could be integrated into pins itself (also pins for Python (in the future, maybe also JS?)). In that spirit, I'd be happy to create a new issue and contribute to a PR.

Thanks!

machow commented 2 years ago

Ah thanks for this incredible prototype--It seems like a really useful feature!

It seems like a tricky piece here is adding useful things while keeping a narrow scope in pins. Right now, pins handles..

I wonder if one way to make board_url() to be more useful--while keeping pins' scope narrow--could be..

The big advantage AFAICT of the manifest is that users reading from a board could use just the http address, while the person curating the board could update the manifest to add pins, etc..

WDYT? (cc @juliasilge )

ijlyttle commented 2 years ago

Thanks @machow!

I think you summed up exactly what the manifest aims to do, and I can appreciate keeping a narrow scope. Mindful that this is not my place, I think a manifest file could make board_url() essentially a read-only version of other boards:

I agree that the creation of a manifest has to be the responsibility of the curator, along the lines of invoking renv::snapshot() as needed.

I have been playing around a bit with the python version; board_urls() seems to take a step in this direction, where you provide a base URL, path, then a pins_path, which is not-too-different from the proposed manifest (of course you know all this, you wrote it!)

Apologies for excitedly throwing out wild possibilities - I am starting to appreciate the potential impact of pins, especially across languages, and 🤯

juliasilge commented 2 years ago

That prototype looks so amazing @ijlyttle! 🙌

If you are up for making a somewhat speculative PR to implement something like this, would you do one that adds a manifest argument to board_url() and we can see how this plays out? I think this is really promising and could work well; having real code to look at and play around with could help us move forward.

ijlyttle commented 2 years ago

Thanks, and yes!

Assuming things work out for this PR (and for #631), it would be great to have them in Python, too (I'm sure you're thinking along those lines, as well).

I don't have a lot pf experience in Python, but to the extent that @machaow would tolerate me "getting in the way", I'd be happy to "help" there, too.

juliasilge commented 1 year ago

Check out how this works now! Perhaps the easiest way to see this is this test:

https://github.com/rstudio/pins-r/blob/fd1708fd959d913c5aa2133353939e6ed484b852/tests/testthat/test-board_url.R#L72-L89

We also will have a new vignette outlining this approach, as described in #685.

We would love it if you tried this out (install via remotes::install_github("rstudio/pins-r")) and gave us any feedback before we send this to CRAN!

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.