tidyverse / dbplyr

Database (DBI) backend for dplyr
https://dbplyr.tidyverse.org
Other
477 stars 174 forks source link

Translation of `difftime()` is inverted on Snowflake, MSSQL, Redshift #1525

Closed fh-afrachioni closed 1 month ago

fh-afrachioni commented 3 months ago

On several backends, difftime(a, b) is translated to DATEDIFF(a, b), when it should be DATEDIFF(b, a). Counterintuitive but well documented:

Notably, Spark appears to be implemented correctly, as it has arguments in the opposite order. Other backends seem to make use of the - operator correctly.

Example:

date_today <- Sys.Date()
date_tomorrow <- date_today + 1

test_date_diff <- function(df) {
  df %>% 
    transmute(
      # bonus issue: we need as.Date() to get a date column, even when we already have a Date
      date_today = as.Date(!!date_today),
      date_tomorrow = as.Date(!!date_tomorrow),
      minus_result = date_tomorrow - date_today,
      difftime_result = difftime(date_tomorrow, date_today)
    )
}

data.frame(i = 0) %>% 
  dbplyr::tbl_lazy(dbplyr::simulate_snowflake()) %>%
  test_date_diff()

leads to

SELECT
  `date_today`,
  `date_tomorrow`,
  `date_tomorrow` - `date_today` AS `minus_result`,
  DATEDIFF(DAY, `date_tomorrow`, `date_today`) AS `difftime_result`
FROM (
  SELECT
    `df`.*,
    CAST('2024-07-19' AS DATE) AS `date_today`,
    CAST('2024-07-20' AS DATE) AS `date_tomorrow`
  FROM `df`
) AS `q01`

And on a real live Snowflake backend,

actual_snowflake_table %>% 
  head(1) %>% 
  test_date_diff()

leads to

# Source:   SQL [1 x 4]
# Database: Snowflake 8.26.0[anthony.frachioni@Snowflake/]
   date_today date_tomorrow minus_result difftime_result
   <date>     <date>               <dbl>           <dbl>
 1 2024-07-19 2024-07-20               1              -1

cc @fh-mthomson

edward-burn commented 2 months ago

Just to note I also find this inconsistency on postgres

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

test_data <- data.frame(
  person = 1L,
  date_1 = as.Date("1980-01-01"),
  date_2 = as.Date("2010-01-01")
)

con <- DBI::dbConnect(RPostgres::Postgres(),
                      dbname = Sys.getenv("CDM5_POSTGRESQL_DBNAME"),
                      host = Sys.getenv("CDM5_POSTGRESQL_HOST"),
                      user = Sys.getenv("CDM5_POSTGRESQL_USER"),
                      password = Sys.getenv("CDM5_POSTGRESQL_PASSWORD"))

test_data |> 
  mutate(days = difftime(date_1, date_2))
#>   person     date_1     date_2        days
#> 1      1 1980-01-01 2010-01-01 -10958 days

db_test_data <- copy_to(con, 
                        test_data, 
                        overwrite = TRUE)
db_test_data |> 
  mutate(days = difftime(date_1, date_2))
#> # Source:   SQL [1 x 4]
#> # Database: postgres  [ohdsi@pgsqltest.cqnqzwtn5s1q.us-east-1.rds.amazonaws.com:5432/vocabularyv5]
#>   person date_1     date_2      days
#>    <int> <date>     <date>     <int>
#> 1      1 1980-01-01 2010-01-01 10958

db_test_data |> 
  mutate(days = difftime(date_1, date_2)) |> 
  dplyr::show_query()
#> <SQL>
#> SELECT
#>   "test_data".*,
#>   (CAST("date_2" AS DATE) - CAST("date_1" AS DATE)) AS "days"
#> FROM "test_data"

Created on 2024-08-12 with reprex v2.1.0