r-dbi / RPostgres

A DBI-compliant interface to PostgreSQL
https://rpostgres.r-dbi.org
Other
328 stars 78 forks source link

How to deal with `pq__varchar` columns #444

Open dmkaplan2000 opened 9 months ago

dmkaplan2000 commented 9 months ago

I have some queries that return columns of type pq__varchar. I was wondering if there was some simple way to convert these data into a more R friendly format inside R?

Thanks, David

krlmlr commented 9 months ago

Thanks. Could you please provide a reproducible example? Please see https://reprex.tidyverse.org/ for guidance.

dmkaplan2000 commented 9 months ago

Given any RPostgres connection to a postgresql database, con, try the following:

x = dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
class(x$v)
dmkaplan2000 commented 9 months ago

Ideally I would be able to treat the elements of x$v like vectors/arrays in R, extracting elements as needed, etc., but it isn't clear how to do this with a pq__varchar. I was wondering if there was some sort of straight forward function or method for transforming these into something more familiar to R.

dmkaplan2000 commented 9 months ago

The less than ideal solution that I eventually used for this situation was to parse the string representation of the array in R. But this only works easily because I have 1-D arrays with strings that do not have quote characters or commas in them. A better solution working directly from the pq__varchar variable would be preferable. Here is an example of a solution:

x = dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")

x$v_list = lapply(x$v,\(.) eval(parse(text=sub("[{]","c('",sub("[}]","')",gsub(",","','",.))))))
krlmlr commented 3 months ago

Reprex:

con <- DBI::dbConnect(RPostgres::Postgres())

x <- DBI::dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
x
#>               v
#> 1         {a,b}
#> 2 {abc,def,fgh}
class(x$v)
#> [1] "pq__varchar"

Created on 2024-04-01 with reprex v2.1.0

@paleolimbot: Is there support for nested data in adbcpostgresql?

paleolimbot commented 3 months ago

It seems to!

library(adbcdrivermanager)

con <- adbcpostgresql::adbcpostgresql() |> 
  adbc_database_init(uri = "postgresql://localhost:5432/postgres?user=postgres&password=password") |> 
  adbc_connection_init()

con |> 
  read_adbc(
    "WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a"
  ) |> 
  as.data.frame() |> 
  dplyr::pull()
#> <list_of<character>[2]>
#> [[1]]
#> [1] "a" "b"
#> 
#> [[2]]
#> [1] "abc" "def" "fgh"

Created on 2024-04-01 with reprex v2.1.0

I say this lightly, but I actually spent quite a bit of time getting this to work. It also works for postgres' "record" type, where it should give you back a nested data frame. There is a slight limitation on reading the array type...I believe multidimensional arrays will just be returned as a flat list_of and will (currently) loose the dimensions.

krlmlr commented 3 months ago

Thanks, this is wonderful! I suspect it will also work via adbi, https://adbi.r-dbi.org/. @dmkaplan2000: can you confirm?

krlmlr commented 3 months ago

This brings up memories of my work bringing nested data to the duckdb R client. I do appreciate the effort!

dmkaplan2000 commented 3 months ago

I tried getting this to work with adbi, but I must admit I can't figure out how to successfully connect to a database. First I tried opening a connection with:

con2 = dbConnect(adbi::adbi("adbcpostgresql"),uri="postgresql://localhost:5432/DBNAME?user=USER&password=PASSWORD")

That worked without error, but when I tried to use the connection to execute a query I got:

> dbGetQuery(con2,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
Error in adbcdrivermanager::adbc_statement_get_parameter_schema(stmt) : 
  NOT_IMPLEMENTED

Based on what I could find online, I tried reducing the URI to uri="postgresql://localhost:5432". That also worked for starting the connection, but not for executing a query.

Do you know what I am doing wrong? The help on adbi and adbcpostgresql is pretty spartan...

krlmlr commented 3 months ago

It's a moonshot, but can you try dbGetQuery(immediate = TRUE) ? This is DBI's current way of telling "we don't want a prepared query, just execute". I wonder if adbi takes action based on that flag, and if not, whether it should.

CC @nbenn.

dmkaplan2000 commented 3 months ago

immediate=TRUE fixed the issue!

> con2 = dbConnect(adbi::adbi("adbcpostgresql"),uri="postgresql://localhost:5432")
> x=dbGetQuery(con2,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a",immediate=TRUE)
> x$v
<list_of<character>[2]>
[[1]]
[1] "a" "b"

[[2]]
[1] "abc" "def" "fgh"

So this query works with adbi, but I am not sure this really solves my issue in the sense that it fixes one thing by breaking another. My real queries involve, in addition to character arrays, spatial data provided by the postgis extension to postgresql. RPostgres plays nicely with the sf package so the spatial data can be processed, but I do not think that adbi can be used with sf. At least my initial examination does not indicate that this is the case.