tidyverse / dbplyr

Database (DBI) backend for dplyr
https://dbplyr.tidyverse.org
Other
469 stars 169 forks source link

`Spark SQL` translate_sql for "add_years" is double-quoting column name '`foo`' #1510

Open bschulth opened 1 month ago

bschulth commented 1 month ago

When translating sql on a Spark SQL connection, the 'add_year' translation is double quoting the column name:

# SQL:  add_years(dob, 1)
<SQL> ADD_MONTHS('`dob`', 1.0 * 12.0)

This fails on Spark. I would expect single quotes like:

# SQL:  add_years(dob, 1)
<SQL> ADD_MONTHS(`dob`, 1.0 * 12.0)

This seems to be coming from the triple-bang expression here !!!x: https://github.com/tidyverse/dbplyr/blob/8f2fcb0852994afe11cc6792ef59ef5ffda84936/R/backend-spark-sql.R#L47

       add_years = function(x, n, ...) {
         check_dots_empty()
         sql_expr(add_months(!!!x, !!n*12))
       }

There even seems to be a bad unit test for this here:

https://github.com/tidyverse/dbplyr/blob/8f2fcb0852994afe11cc6792ef59ef5ffda84936/tests/testthat/test-backend-spark-sql.R#L3C35-L3C50

expect_equal(test_translate_sql(add_years(x, 1)), sql("ADD_MONTHS('`x`', 1.0 * 12.0)"))

If I change the triple-bang !!!x to a double-bang !!x:

       add_years = function(x, n, ...) {
         check_dots_empty()
         # sql_expr(add_months(!!!x, !!n*12))
         sql_expr(add_months(!!x, !!n*12))
       }

it seems to produce the correct output.

Here is the reprex:

dbplyr:::local_con(dbplyr:::simulate_spark_sql())
dbplyr:::test_translate_sql(add_years(dob, 1))
#> <SQL> ADD_MONTHS('`dob`', 1.0 * 12.0)
reprex::reprex()
#> ℹ Non-interactive session, setting `html_preview = FALSE`.
#> CLIPR_ALLOW has not been set, so clipr will not run interactively
#> Error in switch(where, expr = stringify_expression(x_expr), clipboard = ingest_clipboard(), : EXPR must be a length 1 vector
ablack3 commented 1 month ago

Here is a reprex on spark showing correct SQL works and current SQL does not.

con <- DBI::dbConnect(
  odbc::databricks(),
  httpPath = Sys.getenv("DATABRICKS_HTTPPATH"),
  useNativeQuery = FALSE,
  bigint = "numeric"
)

DBI::dbGetQuery(con,
                "select observation_period_start_date, ADD_MONTHS('`observation_period_start_date`', 1.0 * 12.0) as day 
                from gibleed.observation_period limit 5")
#>   observation_period_start_date  day
#> 1                    1963-12-31 <NA>
#> 2                    2009-04-26 <NA>
#> 3                    2002-01-30 <NA>
#> 4                    1971-10-14 <NA>
#> 5                    2009-05-30 <NA>

DBI::dbGetQuery(con,
                "select observation_period_start_date, ADD_MONTHS(`observation_period_start_date`, 1.0 * 12.0) as day 
                from gibleed.observation_period limit 5")
#>   observation_period_start_date        day
#> 1                    1963-12-31 1964-12-31
#> 2                    2009-04-26 2010-04-26
#> 3                    2002-01-30 2003-01-30
#> 4                    1971-10-14 1972-10-14
#> 5                    2009-05-30 2010-05-30

DBI::dbDisconnect(con)

Created on 2024-06-05 with reprex v2.1.0