voltrondata-labs / arrowbench

R package for benchmarking
13 stars 9 forks source link

Install duckdb from r-universe #109

Open alistaire47 opened 2 years ago

alistaire47 commented 2 years ago

If we install {duckdb} from r-universe instead of building it from scratch, our r-release builds should be much much faster

jonkeane commented 2 years ago

Basically turn the custom duckdb install into:

options(repos = c(duckdb = 'https://duckdb.r-universe.dev', CRAN = 'https://cloud.r-project.org'))


Then once that is installed, the following will work (I think we technically only need the install once, and then load after each connection going forward...)

con <- dbConnect(duckdb())
dbSendStatement(con, "INSTALL tpch;")
dbSendStatement(con, "LOAD tpch;")
alistaire47 commented 2 years ago

@jonkeane I'm messing with this a bit, and I'm getting 0.5.0 installed fine, but I haven't figured out how to get the tpch extension working yet:

> callr::r(function() {
+     con <- DBI::dbConnect(duckdb::duckdb(dbdir = ":memory:"))
+     on.exit(DBI::dbDisconnect(con, shutdown = TRUE))
+     DBI::dbSendStatement(con, "INSTALL 'tpch';")
+ }, list(), libpath = c(arrowbench:::custom_duckdb_lib_dir(), .libPaths()))
Error: callr subprocess failed: rapi_execute: Failed to run query
Error: IO Error: Failed to download extension "tpch" at URL "http://extensions.duckdb.org/9c4001d16/osx_arm64/tpch.duckdb_extension.gz"
Extension "tpch" is an existing extension.
Are you using a development build? In this case, extensions might not (yet) be uploaded.
Type .Last.error.trace to see where the error occurred
> callr::r(function() {
+     con <- DBI::dbConnect(duckdb::duckdb(dbdir = ":memory:"))
+     on.exit(DBI::dbDisconnect(con, shutdown = TRUE))
+     DBI::dbSendStatement(con, "LOAD 'tpch';")
+ }, list(), libpath = c(arrowbench:::custom_duckdb_lib_dir(), .libPaths()))
Error: callr subprocess failed: rapi_execute: Failed to run query
Error: IO Error: Extension "/Users/alistaire/.duckdb/extensions/9c4001d16/osx_arm64/tpch.duckdb_extension" not found
Type .Last.error.trace to see where the error occurred
> callr::r(function() {
+     con <- DBI::dbConnect(duckdb::duckdb(dbdir = ":memory:"))
+     on.exit(DBI::dbDisconnect(con, shutdown = TRUE))
+     DBI::dbGetQuery(con, "select scale_factor, query_nr from tpch_answers() LIMIT 1;")
+ }, list(), libpath = c(arrowbench:::custom_duckdb_lib_dir(), .libPaths()))
Error: callr subprocess failed: rapi_prepare: Failed to prepare query select scale_factor, query_nr from tpch_answers() LIMIT 1;
Error: Catalog Error: Table Function with name tpch_answers does not exist!
Did you mean "duckdb_views"?
LINE 1: select scale_factor, query_nr from tpch_answers() LIMIT 1;
Type .Last.error.trace to see where the error occurred

Am I doing anything obviously wrong here? Does

Are you using a development build? In this case, extensions might not (yet) be uploaded.

mean we really just need to wait for them to put something at http://extensions.duckdb.org/9c4001d16/osx_arm64/tpch.duckdb_extension.gz ? I'm getting a 403 when I GET that directly...do I need to auth somehow?

jonkeane commented 2 years ago

Ouch, yeah in my experience, the extensions do take a while to be built and uploaded. Which would make this not super helpful if much of the time we install the extension isn't yet available.

I wonder if it's possible to install from one version back on r-universe? I looked at https://duckdb.r-universe.dev/ui#api but didn't find an obvious answer.

Alternatively, we could wait for the duckdb release which should have a (more) stable extension setup.

alistaire47 commented 2 years ago

This endpoint makes it look like yes?

GET /packages/\/\/ JSON array of builds for <package> <version> in this universe.

not sure yet how hit that most easily or figure out what versions are available, but I'll dig a little more