Closed brancengregory closed 5 months ago
In order for us to be able to run queries lazily on multiple remote tables with joins, we will need to use the same db source. That means pulling out the connection creation from tbl_from_gcs_duckdb
and moving it into ojo_connect
with a new argument that can be postgres
or duckdb
.
Also, the values allowed for ojo_tbl(.source =
need to be updated to c("postgres", "gcs_arrow", "gcs_duckdb")
, making sure not to break the current API
It would be really great if we could set the ojo_tbl
source at the global level, perhaps in an env variable or with options()
@andrewjbe this is ready for ya
The .source = "duckdb"
argument currently has an issue where it will work if it's the first connection you make in the session:
But it will fail if you use .source = "postgres"
beforehand:
I changed the arrow source argument from gcs_parquet
to gcs_arrow
, but left your other changes. I made it to where connection objects are stored independently based on connection type and source so that a connection to postgres and one to duckdb will be handled independently, even coexisting in the same session.
Added the following chunk to ojo_collect()
so that it will work no matter what connection type you're using:
# Check class of `.data` # First, check if it's an arrow connection; this won't work with the rest of ojo_collect() if (inherits(.data, "ArrowTabular")) { # Display CLI cli::cli_div( theme = list( rule = list(color = "br_yellow", "line-type" = "single"), "span.grayed" = list(color = "grey") ) ) cli::cli_rule(left = paste("Connection: OJO GCS Arrow Tables"), right = "{.emph ojodb {utils::packageVersion('ojodb')}}") return(dplyr::collect(.data)) }
@andrewjbe This can pull all unique counts as filed in a couple seconds. Computes counts of distinct minute codes in seconds, too.