Support for {duckdb} - Githubissues

openjusticeok / ojodb

OJO's R package for opening the black box of our justice system

https://openjusticeok.github.io/ojodb/

GNU General Public License v3.0

8 stars 3 forks source link

Support for {duckdb} #168

Closed brancengregory closed 5 months ago

brancengregory commented 5 months ago

@andrewjbe This can pull all unique counts as filed in a couple seconds. Computes counts of distinct minute codes in seconds, too.

brancengregory commented 5 months ago

In order for us to be able to run queries lazily on multiple remote tables with joins, we will need to use the same db source. That means pulling out the connection creation from tbl_from_gcs_duckdb and moving it into ojo_connect with a new argument that can be postgres or duckdb.

brancengregory commented 5 months ago

Also, the values allowed for ojo_tbl(.source = need to be updated to c("postgres", "gcs_arrow", "gcs_duckdb"), making sure not to break the current API

brancengregory commented 5 months ago

It would be really great if we could set the ojo_tbl source at the global level, perhaps in an env variable or with options()

brancengregory commented 5 months ago

@andrewjbe this is ready for ya

andrewjbe commented 5 months ago

The .source = "duckdb" argument currently has an issue where it will work if it's the first connection you make in the session: Screenshot from 2024-06-04 15-01-23

But it will fail if you use .source = "postgres" beforehand: Screenshot from 2024-06-04 15-00-32

brancengregory commented 5 months ago

I changed the arrow source argument from gcs_parquet to gcs_arrow, but left your other changes. I made it to where connection objects are stored independently based on connection type and source so that a connection to postgres and one to duckdb will be handled independently, even coexisting in the same session.

andrewjbe commented 5 months ago

Added the following chunk to ojo_collect() so that it will work no matter what connection type you're using:

# Check class of `.data`
# First, check if it's an arrow connection; this won't work with the rest of ojo_collect()

if (inherits(.data, "ArrowTabular")) {

# Display CLI
cli::cli_div(
  theme = list(
    rule = list(color = "br_yellow",
                "line-type" = "single"),
    "span.grayed" = list(color = "grey")
  )
)

cli::cli_rule(left = paste("Connection: OJO GCS Arrow Tables"),
              right = "{.emph ojodb {utils::packageVersion('ojodb')}}")

return(dplyr::collect(.data))

}