tidyverse / dbplyr

Database (DBI) backend for dplyr
https://dbplyr.tidyverse.org
Other
469 stars 169 forks source link

Breaking changes in dbplyr 2.5.0 #1480

Closed catalamarti closed 2 months ago

catalamarti commented 3 months ago

hi @hadley @mgirlich

the last release of dbplyr (2.5.0) broke the following packages:

It is a difficult fix that requires to come up with a different approach and implement it in all the packages before they are kick out of cran. Unless dbplyr goes back to the prior behavior on time.

I will try to explain with a series of reprex:

library(DBI)
#> Warning: package 'DBI' was built under R version 4.2.3
library(CDMConnector)
#> Warning: package 'CDMConnector' was built under R version 4.2.3
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.2.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(duckdb)

con <- DBI::dbConnect(duckdb(), path = ":memory:")
test_data <- data.frame(person = 1L,
                        date_1 = as.Date("2001-01-01"))
db_test_data <- copy_to(con, test_data, overwrite = TRUE)
db_test_data <- db_test_data  %>%
  dplyr::mutate(date_2 = date_1 + years(1)) 

year_req <- 1

# without CDMConnector::dateadd it works
priorHistoryDates <- glue::glue('date_1 + years({year_req})') %>%
  rlang::parse_exprs() %>%
  rlang::set_names(glue::glue("date_with_prior_history_{year_req}"))
db_test_data %>% 
  mutate(!!!priorHistoryDates)
#> # Source:   SQL [1 x 4]
#> # Database: DuckDB v0.9.2 [eburn@Windows 10 x64:R 4.2.1/:memory:]
#>   person date_1     date_2     date_with_prior_history_1
#>    <int> <date>     <date>     <date>                   
#> 1      1 2001-01-01 2002-01-01 2002-01-01

# with CDMConnector::dateadd it does not
priorHistoryDates <- glue::glue('CDMConnector::dateadd("date_2",
                      {year_req}, interval = "year")') %>%
  rlang::parse_exprs() %>%
  rlang::set_names(glue::glue("date_with_prior_history_{year_req}"))
db_test_data %>% 
  mutate(!!!priorHistoryDates)
#> Error in `CDMConnector::dateadd()`:
#> ! No known SQL translation
#> Backtrace:
#>      ▆
#>   1. ├─base::tryCatch(...)
#>   2. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>   3. │   ├─base (local) tryCatchOne(...)
#>   4. │   │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   5. │   └─base (local) tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#>   6. │     └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>   7. │       └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   8. ├─base::withCallingHandlers(...)
#>   9. ├─base::saveRDS(...)
#>  10. ├─base::do.call(...)
#>  11. ├─base (local) `<fn>`(...)
#>  12. ├─global `<fn>`(input = base::quote("soot-cub_reprex.R"))
#>  13. │ └─rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")
#>  14. │   └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
#>  15. │     └─knitr:::process_file(text, output)
#>  16. │       ├─knitr:::handle_error(...)
#>  17. │       │ └─base::withCallingHandlers(...)
#>  18. │       ├─base::withCallingHandlers(...)
#>  19. │       ├─knitr:::process_group(group)
#>  20. │       └─knitr:::process_group.block(group)
#>  21. │         └─knitr:::call_block(x)
#>  22. │           └─knitr:::block_exec(params)
#>  23. │             └─knitr:::eng_r(options)
#>  24. │               ├─knitr:::in_input_dir(...)
#>  25. │               │ └─knitr:::in_dir(input_dir(), expr)
#>  26. │               └─knitr (local) evaluate(...)
#>  27. │                 └─evaluate::evaluate(...)
#>  28. │                   └─evaluate:::evaluate_call(...)
#>  29. │                     ├─evaluate (local) handle(...)
#>  30. │                     │ └─base::try(f, silent = TRUE)
#>  31. │                     │   └─base::tryCatch(...)
#>  32. │                     │     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  33. │                     │       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  34. │                     │         └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>  35. │                     ├─base::withCallingHandlers(...)
#>  36. │                     ├─base::withVisible(value_fun(ev$value, ev$visible))
#>  37. │                     └─knitr (local) value_fun(ev$value, ev$visible)
#>  38. │                       └─knitr (local) fun(x, options = options)
#>  39. │                         ├─base::withVisible(knit_print(x, ...))
#>  40. │                         ├─knitr::knit_print(x, ...)
#>  41. │                         └─rmarkdown:::knit_print.tbl_sql(x, ...)
#>  42. │                           ├─context$df_print(x)
#>  43. │                           └─dbplyr:::print.tbl_sql(x)
#>  44. │                             ├─dbplyr:::cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
#>  45. │                             │ ├─base::cat(paste0(..., "\n"), sep = "")
#>  46. │                             │ └─base::paste0(..., "\n")
#>  47. │                             ├─base::format(x, ..., n = n, width = width, n_extra = n_extra)
#>  48. │                             └─pillar:::format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
#>  49. │                               └─pillar:::format_tbl(...)
#>  50. │                                 └─pillar::tbl_format_setup(...)
#>  51. │                                   ├─pillar:::tbl_format_setup_dispatch(...)
#>  52. │                                   └─pillar:::tbl_format_setup.tbl(...)
#>  53. │                                     └─pillar:::df_head(x, n + 1)
#>  54. │                                       ├─base::as.data.frame(head(x, n))
#>  55. │                                       └─dbplyr:::as.data.frame.tbl_sql(head(x, n))
#>  56. │                                         ├─base::as.data.frame(collect(x, n = n))
#>  57. │                                         ├─dplyr::collect(x, n = n)
#>  58. │                                         └─dbplyr:::collect.tbl_sql(x, n = n)
#>  59. │                                           └─dbplyr::db_sql_render(x$src$con, x, cte = cte)
#>  60. │                                             ├─dbplyr::db_sql_render(con, sql, ..., sql_options = sql_options)
#>  61. │                                             └─dbplyr:::db_sql_render.DBIConnection(con, sql, ..., sql_options = sql_options)
#>  62. │                                               ├─dbplyr::sql_render(sql, con = con, ..., sql_options = sql_options)
#>  63. │                                               └─dbplyr:::sql_render.tbl_lazy(sql, con = con, ..., sql_options = sql_options)
#>  64. │                                                 ├─dbplyr::sql_render(...)
#>  65. │                                                 └─dbplyr:::sql_render.lazy_query(...)
#>  66. │                                                   ├─dbplyr::sql_build(query, con = con, sql_options = sql_options)
#>  67. │                                                   └─dbplyr:::sql_build.lazy_select_query(query, con = con, sql_options = sql_options)
#>  68. │                                                     └─dbplyr:::get_select_sql(...)
#>  69. │                                                       └─dbplyr:::translate_select_sql(con, select)
#>  70. │                                                         └─dbplyr::translate_sql_(...)
#>  71. │                                                           └─base::lapply(...)
#>  72. │                                                             └─dbplyr (local) FUN(X[[i]], ...)
#>  73. │                                                               ├─dbplyr::escape(eval_tidy(x, mask), con = con)
#>  74. │                                                               └─rlang::eval_tidy(x, mask)
#>  75. └─CDMConnector::dateadd
#>  76.   └─cli::cli_abort(...)
#>  77.     └─rlang::abort(...)
db_test_data %>% 
  mutate(!!!priorHistoryDates) |> 
  dplyr::show_query()
#> Error in `CDMConnector::dateadd()`:
#> ! No known SQL translation
#> Backtrace:
#>      ▆
#>   1. ├─dplyr::show_query(db_test_data %>% mutate(!!!priorHistoryDates))
#>   2. ├─dbplyr:::show_query.tbl_lazy(db_test_data %>% mutate(!!!priorHistoryDates))
#>   3. │ └─dbplyr::remote_query(x, cte = cte, sql_options = sql_options)
#>   4. │   └─dbplyr::db_sql_render(remote_con(x), x, cte = cte, sql_options = sql_options)
#>   5. │     ├─dbplyr::db_sql_render(con, sql, ..., sql_options = sql_options)
#>   6. │     └─dbplyr:::db_sql_render.DBIConnection(con, sql, ..., sql_options = sql_options)
#>   7. │       ├─dbplyr::sql_render(sql, con = con, ..., sql_options = sql_options)
#>   8. │       └─dbplyr:::sql_render.tbl_lazy(sql, con = con, ..., sql_options = sql_options)
#>   9. │         ├─dbplyr::sql_render(...)
#>  10. │         └─dbplyr:::sql_render.lazy_query(...)
#>  11. │           ├─dbplyr::sql_build(query, con = con, sql_options = sql_options)
#>  12. │           └─dbplyr:::sql_build.lazy_select_query(query, con = con, sql_options = sql_options)
#>  13. │             └─dbplyr:::get_select_sql(...)
#>  14. │               └─dbplyr:::translate_select_sql(con, select)
#>  15. │                 └─dbplyr::translate_sql_(...)
#>  16. │                   └─base::lapply(...)
#>  17. │                     └─dbplyr (local) FUN(X[[i]], ...)
#>  18. │                       ├─dbplyr::escape(eval_tidy(x, mask), con = con)
#>  19. │                       └─rlang::eval_tidy(x, mask)
#>  20. └─CDMConnector::dateadd
#>  21.   └─cli::cli_abort(...)
#>  22.     └─rlang::abort(...)

# without explictly calling CDMConnector
priorHistoryDates <- glue::glue('dateadd("date_2",
                      {year_req}, interval = "year")') %>%
  rlang::parse_exprs() %>%
  rlang::set_names(glue::glue("date_with_prior_history_{year_req}"))
db_test_data %>% 
  mutate(!!!priorHistoryDates)
#> Warning: Named arguments ignored for SQL dateadd
#> Error in `collect()`:
#> ! Failed to collect lazy table.
#> Caused by error:
#> ! Parser Error: syntax error at or near "AS"
#> LINE 3:   dateadd('date_2', 1.0, 'year' AS "interval") AS date_...
#>                                         ^
#> Backtrace:
#>      ▆
#>   1. ├─base::tryCatch(...)
#>   2. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>   3. │   ├─base (local) tryCatchOne(...)
#>   4. │   │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   5. │   └─base (local) tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#>   6. │     └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>   7. │       └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   8. ├─base::withCallingHandlers(...)
#>   9. ├─base::saveRDS(...)
#>  10. ├─base::do.call(...)
#>  11. ├─base (local) `<fn>`(...)
#>  12. ├─global `<fn>`(input = base::quote("soot-cub_reprex.R"))
#>  13. │ └─rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")
#>  14. │   └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
#>  15. │     └─knitr:::process_file(text, output)
#>  16. │       ├─knitr:::handle_error(...)
#>  17. │       │ └─base::withCallingHandlers(...)
#>  18. │       ├─base::withCallingHandlers(...)
#>  19. │       ├─knitr:::process_group(group)
#>  20. │       └─knitr:::process_group.block(group)
#>  21. │         └─knitr:::call_block(x)
#>  22. │           └─knitr:::block_exec(params)
#>  23. │             └─knitr:::eng_r(options)
#>  24. │               ├─knitr:::in_input_dir(...)
#>  25. │               │ └─knitr:::in_dir(input_dir(), expr)
#>  26. │               └─knitr (local) evaluate(...)
#>  27. │                 └─evaluate::evaluate(...)
#>  28. │                   └─evaluate:::evaluate_call(...)
#>  29. │                     ├─evaluate (local) handle(...)
#>  30. │                     │ └─base::try(f, silent = TRUE)
#>  31. │                     │   └─base::tryCatch(...)
#>  32. │                     │     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  33. │                     │       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  34. │                     │         └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>  35. │                     ├─base::withCallingHandlers(...)
#>  36. │                     ├─base::withVisible(value_fun(ev$value, ev$visible))
#>  37. │                     └─knitr (local) value_fun(ev$value, ev$visible)
#>  38. │                       └─knitr (local) fun(x, options = options)
#>  39. │                         ├─base::withVisible(knit_print(x, ...))
#>  40. │                         ├─knitr::knit_print(x, ...)
#>  41. │                         └─rmarkdown:::knit_print.tbl_sql(x, ...)
#>  42. │                           ├─context$df_print(x)
#>  43. │                           └─dbplyr:::print.tbl_sql(x)
#>  44. │                             ├─dbplyr:::cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
#>  45. │                             │ ├─base::cat(paste0(..., "\n"), sep = "")
#>  46. │                             │ └─base::paste0(..., "\n")
#>  47. │                             ├─base::format(x, ..., n = n, width = width, n_extra = n_extra)
#>  48. │                             └─pillar:::format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
#>  49. │                               └─pillar:::format_tbl(...)
#>  50. │                                 └─pillar::tbl_format_setup(...)
#>  51. │                                   ├─pillar:::tbl_format_setup_dispatch(...)
#>  52. │                                   └─pillar:::tbl_format_setup.tbl(...)
#>  53. │                                     └─pillar:::df_head(x, n + 1)
#>  54. │                                       ├─base::as.data.frame(head(x, n))
#>  55. │                                       └─dbplyr:::as.data.frame.tbl_sql(head(x, n))
#>  56. │                                         ├─base::as.data.frame(collect(x, n = n))
#>  57. │                                         ├─dplyr::collect(x, n = n)
#>  58. │                                         └─dbplyr:::collect.tbl_sql(x, n = n)
#>  59. │                                           ├─base::withCallingHandlers(...)
#>  60. │                                           ├─dbplyr::db_collect(...)
#>  61. │                                           └─dbplyr:::db_collect.DBIConnection(...)
#>  62. │                                             ├─DBI::dbSendQuery(con, sql)
#>  63. │                                             └─duckdb::dbSendQuery(con, sql)
#>  64. │                                               └─duckdb (local) .local(conn, statement, ...)
#>  65. │                                                 └─duckdb:::rapi_prepare(conn@conn_ref, statement)
#>  66. └─base::.handleSimpleError(...)
#>  67.   └─dbplyr (local) h(simpleError(msg, call))
#>  68.     └─cli::cli_abort("Failed to collect lazy table.", parent = cnd)
#>  69.       └─rlang::abort(...)
db_test_data %>% 
  mutate(!!!priorHistoryDates) |> 
  dplyr::show_query()
#> Warning: Named arguments ignored for SQL dateadd
#> <SQL>
#> SELECT
#>   q01.*,
#>   dateadd('date_2', 1.0, 'year' AS "interval") AS date_with_prior_history_1
#> FROM (
#>   SELECT test_data.*, date_1 + TO_YEARS(CAST(1.0 AS INTEGER)) AS date_2
#>   FROM test_data
#> ) q01

Created on 2024-03-20 with reprex v2.0.2

dateadd it is a function that we have in CDMConnector to provide the translation to add a number to a date in different dbms: https://github.com/darwin-eu/CDMConnector/blob/105b9d3df7e3fb2e66a14928e51557c22e8881ca/R/dateadd.R#L25-L65

The long term plan is to contribute to dbplyr as @ablack3 did here https://github.com/tidyverse/dbplyr/pull/1357 so long term this utility functions are not needed. But for the moment while it is not implemented in all dbms we need a work around to make this work.

edgararuiz commented 3 months ago

sparklyr is having the same issues when printing to console, so far I've tracked it down to head() producing the table.* call. It looks like when dbplyr pulls the first row, I guess to get the column names, that call triggers either a head() call or some lower level call in that same vein, which in turn outputs the error

Here is the issue in sparklyr: https://github.com/sparklyr/sparklyr/issues/3429

Reprex:

library(sparklyr)
packageVersion("dbplyr")
#> [1] '2.5.0'
sc <- spark_connect("local", version = "3.3.0") 
tbl_mtcars <- copy_to(sc, mtcars)
dbplyr::remote_query(head(tbl_mtcars, 1))
#> <SQL> SELECT `mtcars`.`*`
#> FROM `mtcars`
#> LIMIT 1
mgirlich commented 3 months ago

@edgararuiz Are you saying that spark only supports * when it's not qualified? I.e. this would work

SELECT `*`
FROM `mtcars`
LIMIT 1

but not this

SELECT `mtcars`.`*`
FROM `mtcars`
LIMIT 1

(the relevant change would then come from https://github.com/tidyverse/dbplyr/pull/1278)

mgirlich commented 3 months ago

@catalamarti I think your reprex boils down to

library(DBI)
library(CDMConnector)
library(dplyr)
library(duckdb)

con <- DBI::dbConnect(duckdb(), path = ":memory:")
test_data <- data.frame(person = 1L,
                        date_1 = as.Date("2001-01-01"))
db_test_data <- copy_to(con, test_data, overwrite = TRUE)
db_test_data <- db_test_data  %>%
  dplyr::mutate(date_2 = date_1 + years(1)) 

db_test_data %>% 
  mutate(date_with_prior_history_1 = CDMConnector::dateadd("date_2", 1, interval = "year"))

but if you replace the last line with

db_test_data %>% 
  mutate(date_with_prior_history_1 = !!CDMConnector::dateadd("date_2", 1, interval = "year"))

it works. This is similar to what you do in your examples. Would that work for you? @hadley this only worked in dbplyr < 2.5.0 due to qualifying the function with the package, i.e. using CDMConnector::dateadd instead of dateadd.

catalamarti commented 3 months ago

hi @mgirlich so yes your approach works and this works for the usual use of the function. But in packages where I want to put several variables in the same mutate (to generate efficient sql) depending on external variables I can no longer use !!! function as it breaks. reprex:

library(DBI)
library(CDMConnector)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(duckdb)

con <- DBI::dbConnect(duckdb(), path = ":memory:")
test_data <- data.frame(person = 1L,
                        date_1 = as.Date("2001-01-01"))
db_test_data <- copy_to(con, test_data, overwrite = TRUE)
db_test_data <- db_test_data  %>%
  dplyr::mutate(date_2 = date_1 + years(1)) 

# This works
db_test_data %>% 
  mutate(date_with_prior_history_1 = !!CDMConnector::dateadd("date_2", 1, interval = "year"))
#> # Source:   SQL [1 x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.3/:memory:]
#>   person date_1     date_2              date_with_prior_history_1
#>    <int> <date>     <dttm>              <dttm>                   
#> 1      1 2001-01-01 2002-01-01 00:00:00 2003-01-01 00:00:00

# This doesn't
x <- rlang::parse_exprs("CDMConnector::dateadd('date_2', 1, interval = 'year')") %>%
  rlang::set_names(glue::glue("date_with_prior_history_1"))
db_test_data %>% 
  mutate(!!!x)
#> Error in `CDMConnector::dateadd()`:
#> ! No known SQL translation

# Use case
ph <- 1:3
x <- glue::glue("CDMConnector::dateadd('date_2', {ph}, interval = 'year')") %>%
  rlang::parse_exprs() %>%
  rlang::set_names(glue::glue("date_with_prior_history_{ph}"))
db_test_data %>% 
  mutate(!!!x)
#> Error in `CDMConnector::dateadd()`:
#> ! No known SQL translation

Created on 2024-03-22 with reprex v2.1.0

Sorry for not providing more context before

mgirlich commented 3 months ago

You can simply wrap the thing with local():

x <- rlang::parse_exprs("local(CDMConnector::dateadd('date_2', 1, interval = 'year'))") %>%
  rlang::set_names(glue::glue("date_with_prior_history_1"))
db_test_data %>% 
  mutate(!!!x)
catalamarti commented 3 months ago

Thank you very much @mgirlich that works for us, we will amend the packages. FYI: @edward-burn @ablack3 @ilovemane @mimiyuchenguo ...

edgararuiz commented 3 months ago

@mgirlich - Correct, the back ticks cause an issue. But at this time, I made a change in sparklyr that prevents the failure. I matched sparklyr's dbQuoteIdentifier() to what the other methods do, which is to return x as-is if x is SQL class. This is probably why other backends did not have that issue.

https://github.com/sparklyr/sparklyr/blob/da3b235546503de00c22e3e7fed5b9ebd94e4180/R/dbi_spark_connection.R#L63

hadley commented 3 months ago

@catalamarti many apologies for this screw up — I had spotted the problems in my revdep checks but I incorrectly attributed them to a problem that I thought I had fixed in CDMConnector. I know it's miserable to receive this sort of email from CRAN and I'm sorry my mistake has caused extra work for you 😞.

(And apologies for taking so long to respond to this thread; in hindsight it was clearly a bad idea to submit dbplyr to CRAN just before I left for two weeks vacation 😬)

catalamarti commented 2 months ago

Thank you very much @mgirlich @hadley for me we can close the issue.