rstudio / pins-r

Pin, Discover and Share Resources
https://pins.rstudio.com
Other
301 stars 62 forks source link

Explore board based on arrow's S3 support #530

Open hadley opened 2 years ago

hadley commented 2 years ago

https://arrow.apache.org/docs/r/articles/fs.html#file-systems-that-emulate-s3

juliasilge commented 1 year ago

Via @GShotwell, this much is already possible:

library(pins)

board <- board_connect(server = "https://colorado.posit.co/rsc/",
                         account = "gordon.shtowell@posit.co",
                         key = Sys.getenv("COLORADO_KEY"))

pin(mtcars, board = board)

library(duckdb)
library(DBI)
con <- DBI::dbConnect(duckdb())
dbExecute(con, "INSTALL 'httpfs.duckdb_extension'")

dbGetQuery(con, "SELECT mpg FROM 'https://colorado.posit.co/rsc/content/519521d1-a6a1-45e6-a5ec-01046686f85f/data.csv'")
gshotwell commented 1 year ago

This is what Hugging face does for their flat files. The way they do it is:

I think this would be a very good Connect feature because it really reduces the memory footprint of Connect assets without sacrificing much speed.

machow commented 1 year ago

Isn't the example above working only because that file is publicly readable? There needs to be some kind of R filesystem abstraction duckdb can use to authenticate (either arrow fs, or similar to fsspec in python, or using duckdb's httpfs for non-connect cases)

I'm guessing you can use httpfs right now, but it won't support connect, since connect is not s3 compatible (only s3, gcs, etc..)