version2 - dplyr for vctrs_rcrd and POSIXlt support

reconverse / incidence2

Compute and visualise incidence (reworking of the original incidence package)

https://www.reconverse.org/incidence2

Other

17 stars 2 forks source link

version2 - dplyr for vctrs_rcrd and POSIXlt support #88

Closed TimTaylor closed 1 year ago

TimTaylor commented 2 years ago

Currently using only data.table for aggregation which means we cannot support non-atomic columns (see https://github.com/Rdatatable/data.table/issues?q=is%3Aopen+is%3Aissue+label%3A%22non-atomic+column%22 for further discussion). Previously I switched behaviour based on the input and dispatched to either data.table or dplyr accordingly. Need to think about whether I do the same thing going forward or error on known problematic inputs.

I want to avoid changing user inputs so if I do not go with the dual use I'd like to error on POSIXlt input .

TimTaylor commented 2 years ago

Maybe support POSIXlt with a warning but error on vctrs_rcrd input???

thibautjombart commented 1 year ago

Would it make sense to automatically convert POSIXlt to POSIXct with a warning?

TimTaylor commented 1 year ago

Am planning to add dplyr support for this but at the same time I'm wondering about disallowing POSIXt objects.

My issue with POSIXt is that by looking at it you cannot tell what units someone is thinking in (e.g. datetime hh:00:00 could be someone thinking in seconds, minutes or hours). Without extra information passed around it's hard to tell. This makes me think that we whould push people to use alternatives. For instance if someone wanted hourly recording then either seperate day and hour groups may be appropriate or perhaps the use of something like {clock}. Code for reference to hopefully show what I'm meaning

(dat <- as.POSIXct("2020-01-01"))
#> [1] "2020-01-01 GMT"
dat + 1
#> [1] "2020-01-01 00:00:01 GMT"

(clock_dat <- 
    dat |> 
    clock::as_naive_time() |> 
    clock::time_point_cast("hour"))
#> <clock_naive_time[1]>
#> [1] "2020-01-01T00"

as.POSIXct(clock_dat + 1)
#> [1] "2020-01-01 01:00:00 GMT"

@thibautjombart - Feedback wanted as want to think carefully about this.

thibautjombart commented 1 year ago

I would add an argument to relevant functions (I am thinking, maybe only the constructor, actually) allowing the user to specify the unit, and defaulting to 'day'. In practice, I have seen maybe once or twice in the last 10 years a need for handling sub-day units, and most people would want days. Also, I think it may still be the case that rio::import("a_linelist.xlsx") will store dates as POSIXct by default, where the hhmmss are meaningless and not part of the original data. Maybe something along the lines of:

## most common use-case
source_file |>
  rio::import() |> 
  incidence(date_index = "date_onset") # POSIXct where units = days

## less common use-case
source_file |>
  rio::import() |> 
  incidence(dt_admission = "date__time_admission", posix_unit = "hour") # POSIX with units = hours

I am even thinking of converting POSIXct with days as units to Date, which could be controlled using an argument simplify_posix = TRUE by default.

TimTaylor commented 1 year ago

Currently not going to support.