Open hadley opened 2 years ago
Via @GShotwell, this much is already possible:
library(pins)
board <- board_connect(server = "https://colorado.posit.co/rsc/",
account = "gordon.shtowell@posit.co",
key = Sys.getenv("COLORADO_KEY"))
pin(mtcars, board = board)
library(duckdb)
library(DBI)
con <- DBI::dbConnect(duckdb())
dbExecute(con, "INSTALL 'httpfs.duckdb_extension'")
dbGetQuery(con, "SELECT mpg FROM 'https://colorado.posit.co/rsc/content/519521d1-a6a1-45e6-a5ec-01046686f85f/data.csv'")
This is what Hugging face does for their flat files. The way they do it is:
I think this would be a very good Connect feature because it really reduces the memory footprint of Connect assets without sacrificing much speed.
Isn't the example above working only because that file is publicly readable? There needs to be some kind of R filesystem abstraction duckdb can use to authenticate (either arrow fs, or similar to fsspec in python, or using duckdb's httpfs for non-connect cases)
I'm guessing you can use httpfs right now, but it won't support connect, since connect is not s3 compatible (only s3, gcs, etc..)
https://arrow.apache.org/docs/r/articles/fs.html#file-systems-that-emulate-s3