r-dbi / RSQLite

R interface for SQLite
https://rsqlite.r-dbi.org
GNU Lesser General Public License v2.1
327 stars 81 forks source link

Adding Date operations #379

Open schuemie opened 3 years ago

schuemie commented 3 years ago

Having the extended types is great, but it seems they can only be used to put data in and get it out? Are there functions and operators in SQL that allow working with dates? (I searched, but was unable to find any documentation on this topic). For example, there does not appear to be a DATEDIFF() or DATEADD() equivalent, and casting to DATE has no effect:

library(RSQLite)
con <- dbConnect(RSQLite::SQLite(), extended_types = TRUE)

# Cast to DATE has no effect. Note that same effect is achieved when casting to random type like 'FOOBAR'.
tibble::tibble(dbGetQuery(con, "SELECT CAST('2000-01-01' AS DATE) AS a_date;"))
#> # A tibble: 1 x 1
#> a_date
#> <int>
#>   1   2000
# (Expected output: a variable of type Date with value '2000-01-01')

# Trying to find number of days between two dates:
dbSendStatement(con, "CREATE TABLE test (x DATE, y DATE);")
dbSendStatement(con, "INSERT INTO test (x, y) VALUES ('2000-01-01', '2000-01-03');")
tibble::tibble(dbGetQuery(con, "SELECT y - x AS delta FROM test;"))
# # A tibble: 1 x 1
# delta
# <int>
#   1     0
# (Expected output: a variable of type integer with value 2)
ablack3 commented 3 years ago

Here is what I found on SQLite date functions. No datediff, datepart, or dateadd unfortunately. https://www.sqlite.org/lang_datefunc.html

ablack3 commented 3 years ago

Here are some examples for manipulating dates in SQLite using RSQLite. One tricky part is the time zone. When data is loaded into SQLite from R using dbWriteTable I think it is loaded using the native time zone which can cause problems with the examples below.

library(DBI)
library(tibble)
con <- dbConnect(RSQLite::SQLite(), ":memory:", extended_types = TRUE)

# Create a table with a date in it
dbExecute(con, "create table tbl2 (aDate DATE)")
#> [1] 0
dbExecute(con, "insert into tbl2 (aDate) values ('2020-01-01')")
#> [1] 1

# Confirm that the table has a date in it
dbGetQuery(con, "select * from tbl2") %>% as_tibble()
#> # A tibble: 1 x 1
#>   aDate     
#>   <date>    
#> 1 2020-01-01

# Add a day, month or year to a date column
dbGetQuery(con, "select date(aDate, '+1 day') from tbl2") %>% as_tibble()
#> # A tibble: 1 x 1
#>   `date(aDate, '+1 day')`
#>   <chr>                  
#> 1 2020-01-02
dbGetQuery(con, "select date(aDate, '+1 month') from tbl2") %>% as_tibble()
#> # A tibble: 1 x 1
#>   `date(aDate, '+1 month')`
#>   <chr>                    
#> 1 2020-02-01
dbGetQuery(con, "select date(aDate, '+1 year') from tbl2") %>% as_tibble()
#> # A tibble: 1 x 1
#>   `date(aDate, '+1 year')`
#>   <chr>                   
#> 1 2021-01-01

# Subtract 1 year, 2 months and 3 days from a date
dbGetQuery(con, "select aDate, date(aDate, '-1 year', '-2 months', '-3 days') as date2 from tbl2") %>% as_tibble()
#> # A tibble: 1 x 2
#>   aDate      date2     
#>   <date>     <chr>     
#> 1 2020-01-01 2018-10-29

# Calculate difference in days between two dates
dbGetQuery(con, "select julianday(aDate) - julianday(date(aDate, '-2 months'))  from tbl2") %>% as_tibble()
#> # A tibble: 1 x 1
#>   `julianday(aDate) - julianday(date(aDate, '-2 months'))`
#>                                                      <dbl>
#> 1                                                       61

dbDisconnect(con)

Created on 2021-07-29 by the reprex package (v2.0.0)

ablack3 commented 3 years ago

Here is an example using dbWriteTable that does not work as expected. I tried a few variations using lubridate package to create the date but still no luck.

library(DBI)
library(tibble)
con <- dbConnect(RSQLite::SQLite(), ":memory:", extended_types = TRUE)

Sys.getlocale(category = "LC_ALL")
#> [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
dbWriteTable(con, "tbl", data.frame(aDate = as.Date("2020-01-01", tz = "UTC")))
# dbWriteTable(con, "tbl", data.frame(aDate = lubridate::ymd("2020-01-01")))
# dbWriteTable(con, "tbl", data.frame(aDate = lubridate::ymd("2020-01-01", locale = Sys.getlocale("LC_CTYPE"))))

dbGetQuery(con, "select date(aDate, '+1 day') from tbl") %>% as_tibble()
#> # A tibble: 1 x 1
#>   `date(aDate, '+1 day')`
#>   <chr>                  
#> 1 -4663-11-25
dbDisconnect(con)

Created on 2021-07-29 by the reprex package (v2.0.0)

MVesuviusC commented 3 years ago

Hello! I think this fits here, but if it doesn't let me know and I can create a new issue. I have seen similar issues with timezones when working with datetime types and wanted to make folks aware in case someone else has the same problem I did.

If I put data into a table with dbWriteTable, the timezone gets converted to UTC but when I use dbExecute to do an update to the table, the time zone does not get converted.

I get around this in my use case by converting to UTC before I do the update, but it caught me by surprise when I first noticed this and it may throw other folks off if they're not aware.

library(magrittr)

time = Sys.time()

db_con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:", extended_types = TRUE)

DBI::dbCreateTable(db_con, "time_table", list(id = "integer primary key", time = "datetime", time2 = "datetime"))

DBI::dbWriteTable(db_con, "time_table", value = data.frame(time = time, time2 = time), append = TRUE)

DBI::dbReadTable(db_con, "time_table")

Both times are identical and both are UTC

  id                time               time2
1  1 2021-08-05 16:57:10 2021-08-05 16:57:10
db_command <- paste("UPDATE time_table SET time2 = \'", time, "\' WHERE id = 1", sep = "")

DBI::dbExecute(db_con, db_command)

DBI::dbReadTable(db_con, "time_table")

DBI::dbDisconnect(db_con)

time2 is different and now in my time zone

  id                time               time2
1  1 2021-08-05 16:57:10 2021-08-05 12:57:10

Convert the time to UTC before I do the update to fix the issue

time <- lubridate::as_datetime(time) %>%
  lubridate::with_tz("UTC")

db_command <- paste("UPDATE time_table SET time2 = \'", time, "\' WHERE id = 1", sep = "")

DBI::dbExecute(db_con, db_command)

DBI::dbReadTable(db_con, "time_table")

time2 and time are now the same again

  id                time               time2
1  1 2021-08-05 16:57:10 2021-08-05 16:57:10

Now that I'm aware of the issue, it makes sense, but it was unexpected when I first came across it.

Thanks for all your work putting this together! Matt

ablack3 commented 3 years ago

Interesting @MVesuviusC. I have a few questions.

Is it possible to get the timezone for a datetime column in RSQLite?

It looks to me like SQLite does not actually support date and datetime data types. Is this correct? If so then what exactly does RSQLite date and datetime support mean? Is the driver package responsible for converting back and forth between dates and integers and keeping track of which columns are dates? Would it make sense for the driver package to implement Date operations like DateDiff() or DateAdd?

image

MVesuviusC commented 3 years ago

It's totally possible I'm implementing something incorrectly @ablack3. Databases are outside my normal workflow so I'm not at all an expert. I mostly built my code based on a few things I found online such as a here (where amusingly enough you also posted) and in the manual:

extended_types When TRUE columns of type DATE, DATETIME / TIMESTAMP, and TIME are mapped
to corresponding R-classes, c.f. below for details. Defaults to FALSE.
DATETIME / TIMESTAMP Count of (fractional) seconds since midnight 1970-01-01 UTC DATE and TIME as above separated by a space POSIXct with time zone UTC

I'm not sure how exactly RSQLite is handling the datetimes at a low level, though. I'm mostly just fudging my way through a project and stumbled across an unexpected behavior.

re: getting the timezone - My guess is that the timezone is assumed to be UTC.

ablack3 commented 3 years ago

@krlmlr - How should we be thinking about RSQLite's date support? Are date types an abstraction added by the RSQLite package? I'm I correct that SQLite does not have a true date type?

ablack3 commented 3 years ago

Here is an issue where dates are not being preserved when using "CREATE TABLE AS SELECT * FROM table_with_dates"

dateTable <- dplyr::tibble(todayDate = Sys.Date(), anotherDate = Sys.Date())

con <- RSQLite::dbConnect(RSQLite::SQLite(), ":memory:", extended_types = TRUE)

RSQLite::dbWriteTable(con, "table1", dateTable)

RSQLite::dbGetQuery(con, "select * from table1")
#>    todayDate anotherDate
#> 1 2021-10-05  2021-10-05

RSQLite::dbExecute(con, "create table table2 as select * from table1")
#> [1] 0

RSQLite::dbGetQuery(con, "select * from table2")
#>   todayDate anotherDate
#> 1     18905       18905

RSQLite::dbDisconnect(con)

Created on 2021-10-05 by the reprex package (v2.0.1)

peterdesmet commented 2 years ago

Is it possible to specify to dbWriteTable() to store datetimes as datetimestrings rather than integers? I have the impression that leads to less issues when handling dates in SQLite.

krlmlr commented 1 year ago

With dbConnect(extended_types = TRUE), we have some support for dates and times, but it's somewhat brittle due to SQLite's weak typing and the absence of built-in types.

Writing dates or datetimes as strings solves only part of the problem -- how do you detect that a text column is actually a date?

To me, it seems that the current implementation is as good as it gets. The original issue is still valid -- implement operations on date and time data, as written by extended_types = TRUE . Adding support for this is somewhat ambitious and perhaps requires writing an SQLite plugin that has only very narrow use -- in this package, or in other codes that use the same format for this data. I'd be open to reviewing a PR.