tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.78k stars 2.12k forks source link

I cannot connect postgresql schema.table with dplyr package #244

Closed diegogarcilazo closed 7 years ago

diegogarcilazo commented 10 years ago

I wrote this question before to know about GitHub/hadley/dplyr. http://stackoverflow.com/questions/21592266/i-cannot-connect-postgresql-schema-table-with-dplyr-package

Thanks for dplyr, ggplots, reshape2 and .....

diegogarcilazo commented 10 years ago

@hadley thinking about your answer I came up to write the full SQL sentence

my_tbl = tbl(my_db, dplyr::sql('SELECT * FROM mortalidad.def0307'))

and there I could connect to the schema and table. Thanks a lot. – Diego

dotcomken commented 10 years ago

I was just going to post this same issue when working with Greenplum. You sql method solved my issue.

diegogarcilazo commented 10 years ago

I'm happy with that. The only thing that worries me is that in the sql statement the following is observed

SELECT "var1", count() AS "n" FROM (SELECT \ FROM mortalidad.mort12)

when any grouping is done. There is a subquery after FROM and I do not know if this will have any type of problem with the efficiency of the sql query. I'm an Epidemiologist and I'm unaware of this.

hadley commented 10 years ago

Would you mind giving me a sequence of SQL commands that I can run in postgresql to create a schema and a database in that schema?

diegogarcilazo commented 10 years ago

Sorry for the delay here I send as a create a scheme

CREATE TABLE schema_name.table_name ( codigo character varying(3), nombre character varying(51), continente character varying(7) )

2014-03-19 18:56 GMT-03:00 Hadley Wickham notifications@github.com:

Would you mind giving me a sequence of SQL commands that I can run in postgresql to create a schema and a database in that schema?

Reply to this email directly or view it on GitHubhttps://github.com/hadley/dplyr/issues/244#issuecomment-38112148 .

Diego Garcilazo Médico Departamento Programas de Salud Instituto Nacional de Enfermedades Respiratorias Av. Blas Parera 8260 // Santa Fe // Argentina //0342 - 4896850 http://www.anlis.gov.ar/inst/iner/

hadley commented 10 years ago

I think there are basically two options here:

  1. Special case escaping whenever there's a dot in the identifier. This would mean that x.y would be escape to "x"."y". This has the advantage of being very succinct, it has the disadvantage that tables translated from R to SQL can't use . in field names (probably not very common).
  2. Use a new escaping mechanism. Probably the easiest way to do this would be to extend ident() to take multiple arguments, e.g. ident("x", "y"). Alternatively could define dot() which does NSE to allow you to write dot(x, y). This is more typing, but is guaranteed to not break any existing code.

This important both for table names (e.g. schema.table) and possibly to disambiguate field names in joins (e.g. table.field)

I'm currently leaning towards the explicit dot() function, because changing how escaping works globally seems risky.

hadley commented 10 years ago

Hmmmm, for this use case, you can just do:

my_tbl <- tbl(my_db, sql('mortalidad.def0307'))
hadley commented 10 years ago

And some other code for future reference (wasn't successful)

dot <- function(...) {
  args <- dots(...)
  is_name <- vapply(args, is.name, logical(1))
  if (any(!is_name)) {
    stop("All arguments to dot() must be names.")
  }

  dots <- vapply(args, as.character, character(1))
  class(dots) <- c("ident_multi", "ident", "sql", "character")
  dots
}

dot <- function(...) {
  out <- paste(escape(c(...), parens = FALSE), collapse = ".")
  class(out) <- c("ident", "sql", "character")
  out
}

#' @export
escape.ident_multi <- function(x, parens = FALSE, collapse = ", ", con = NULL) {
  y <- vapply(x, escape_ident, FUN.VALUE = character(1), con = con)
  paste(y, collapse = ".")
}

#' @export
format.ident_multi <- function(x, ...) paste0("<SQL> ", paste(x, collapse = "."))

# lahman_sqlite() %>% tbl("Batting")
# lahman_sqlite() %>% tbl("main.Batting")
# lahman_sqlite() %>% tbl(sql("main.Batting"))
# lahman_sqlite() %>% tbl(dot("main", "Batting"))
hhoeflin commented 9 years ago

Hi Hadley,

I am having the same issue and the nesting seems to be a problem for performance (in my case a serious one as I am working on data with 100s million rows). Could you elaborate on why your last try failed?

Thanks

HarlanH commented 9 years ago

FWIW, for some use cases a helpful idiom is:

dbSendQuery(con$con, build_sql("SET search_path TO ", schema_name))

Then subsequent queries using that connection are within that schema.

hhoeflin commented 9 years ago

Hi Harlan,

thanks - and I am using something like that already (except I am expanding the search path by the schema. But it only helps so far. If you have tables with the same names in 2 schemas, it won't.

Anyway, I was wondering why not overload the '$' sign in the base_scalar SQL translator.

It would even make sense from an r perspective

a$b$c, there a would be an environment (i.e. a schema), b a table and c a column. The only think that has to be fixed then is tbl_sql, that shouldn't check in such a simple fashion if the table_name exists in the database. Instead, it could check if a "SELECT" statement is part of the SQL query and if not, add a SELECT * in front and check the variables this way.

In this case, one could even enable NSE for the tbl function?

hhoeflin commented 9 years ago

Oh yes, one more comment:

dbSendQuery

often does not work for RJDBC (at least with vertica). The query above, i.e. SET search_path TO ...

does not return a result. dbSendQuery does expect one. For vertica, this does not work and one has to use dbSendUpdate. Is that also the case for MySQL (or at least does dbSendUpdate not fail for MySQL)?

hadley commented 9 years ago

I like the idea of overloading $. I'll definitely think about it when I'm next working on dplyr.

@hhoeflin have you tried RPostgreSQL instead of RJDBC for talking to vertica?

hhoeflin commented 9 years ago

No - had no idea that RPostgreSQL is an option. In any case - the vertica backend is now implemented for RJDBC already. Had a try though and ran into problems as the postgres tables it is trying to access to find out which tables exist in the database seem to be different.

hadley commented 7 years ago

This is slowly getting resolved through DBI: https://github.com/rstats-db/DBI/issues/24