Closed diegogarcilazo closed 7 years ago
@hadley thinking about your answer I came up to write the full SQL sentence
my_tbl = tbl(my_db, dplyr::sql('SELECT * FROM mortalidad.def0307'))
and there I could connect to the schema and table. Thanks a lot. – Diego
I was just going to post this same issue when working with Greenplum. You sql method solved my issue.
I'm happy with that. The only thing that worries me is that in the sql statement the following is observed
SELECT "var1", count() AS "n" FROM (SELECT \ FROM mortalidad.mort12)
when any grouping is done. There is a subquery after FROM and I do not know if this will have any type of problem with the efficiency of the sql query. I'm an Epidemiologist and I'm unaware of this.
Would you mind giving me a sequence of SQL commands that I can run in postgresql to create a schema and a database in that schema?
Sorry for the delay here I send as a create a scheme
CREATE TABLE schema_name.table_name ( codigo character varying(3), nombre character varying(51), continente character varying(7) )
2014-03-19 18:56 GMT-03:00 Hadley Wickham notifications@github.com:
Would you mind giving me a sequence of SQL commands that I can run in postgresql to create a schema and a database in that schema?
Reply to this email directly or view it on GitHubhttps://github.com/hadley/dplyr/issues/244#issuecomment-38112148 .
Diego Garcilazo Médico Departamento Programas de Salud Instituto Nacional de Enfermedades Respiratorias Av. Blas Parera 8260 // Santa Fe // Argentina //0342 - 4896850 http://www.anlis.gov.ar/inst/iner/
I think there are basically two options here:
x.y
would be escape to "x"."y"
. This has the advantage of being very succinct, it has the disadvantage that tables translated from R to SQL can't use .
in field names (probably not very common).ident()
to take multiple arguments, e.g. ident("x", "y")
. Alternatively could define dot()
which does NSE to allow you to write dot(x, y)
. This is more typing, but is guaranteed to not break any existing code.This important both for table names (e.g. schema.table
) and possibly to disambiguate field names in joins (e.g. table.field
)
I'm currently leaning towards the explicit dot()
function, because changing how escaping works globally seems risky.
Hmmmm, for this use case, you can just do:
my_tbl <- tbl(my_db, sql('mortalidad.def0307'))
And some other code for future reference (wasn't successful)
dot <- function(...) {
args <- dots(...)
is_name <- vapply(args, is.name, logical(1))
if (any(!is_name)) {
stop("All arguments to dot() must be names.")
}
dots <- vapply(args, as.character, character(1))
class(dots) <- c("ident_multi", "ident", "sql", "character")
dots
}
dot <- function(...) {
out <- paste(escape(c(...), parens = FALSE), collapse = ".")
class(out) <- c("ident", "sql", "character")
out
}
#' @export
escape.ident_multi <- function(x, parens = FALSE, collapse = ", ", con = NULL) {
y <- vapply(x, escape_ident, FUN.VALUE = character(1), con = con)
paste(y, collapse = ".")
}
#' @export
format.ident_multi <- function(x, ...) paste0("<SQL> ", paste(x, collapse = "."))
# lahman_sqlite() %>% tbl("Batting")
# lahman_sqlite() %>% tbl("main.Batting")
# lahman_sqlite() %>% tbl(sql("main.Batting"))
# lahman_sqlite() %>% tbl(dot("main", "Batting"))
Hi Hadley,
I am having the same issue and the nesting seems to be a problem for performance (in my case a serious one as I am working on data with 100s million rows). Could you elaborate on why your last try failed?
Thanks
FWIW, for some use cases a helpful idiom is:
dbSendQuery(con$con, build_sql("SET search_path TO ", schema_name))
Then subsequent queries using that connection are within that schema.
Hi Harlan,
thanks - and I am using something like that already (except I am expanding the search path by the schema. But it only helps so far. If you have tables with the same names in 2 schemas, it won't.
Anyway, I was wondering why not overload the '$' sign in the base_scalar SQL translator.
It would even make sense from an r perspective
a$b$c, there a would be an environment (i.e. a schema), b a table and c a column. The only think that has to be fixed then is tbl_sql, that shouldn't check in such a simple fashion if the table_name exists in the database. Instead, it could check if a "SELECT" statement is part of the SQL query and if not, add a SELECT * in front and check the variables this way.
In this case, one could even enable NSE for the tbl function?
Oh yes, one more comment:
dbSendQuery
often does not work for RJDBC (at least with vertica). The query above, i.e. SET search_path TO ...
does not return a result. dbSendQuery does expect one. For vertica, this does not work and one has to use dbSendUpdate. Is that also the case for MySQL (or at least does dbSendUpdate not fail for MySQL)?
I like the idea of overloading $
. I'll definitely think about it when I'm next working on dplyr.
@hhoeflin have you tried RPostgreSQL instead of RJDBC for talking to vertica?
No - had no idea that RPostgreSQL is an option. In any case - the vertica backend is now implemented for RJDBC already. Had a try though and ran into problems as the postgres tables it is trying to access to find out which tables exist in the database seem to be different.
This is slowly getting resolved through DBI: https://github.com/rstats-db/DBI/issues/24
I wrote this question before to know about GitHub/hadley/dplyr. http://stackoverflow.com/questions/21592266/i-cannot-connect-postgresql-schema-table-with-dplyr-package
Thanks for dplyr, ggplots, reshape2 and .....