tidyverse / lubridate

Make working with dates in R just that little bit easier
https://lubridate.tidyverse.org
GNU General Public License v3.0
728 stars 207 forks source link

period with vectors of units does not produce the same output as months/weeks #1095

Closed lrossouw closed 1 year ago

lrossouw commented 1 year ago

period collapses vectorised input into a single period as opposed to producing the same length as vectors provided. It makes it not usable in dplyr::mutate for example.

period would be useful in this context if you have an input columns of units and want to produce an output column of those lengths (as opposed to collapsing) the period into a single period.

library(lubridate)
#> Loading required package: timechange
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
stopifnot(period(c(1, 2), units = c("months", "months")) == months(c(1, 2)))
#> Error: period(c(1, 2), units = c("months", "months")) == months(c(1,  .... are not all TRUE
library(lubridate)
#> Loading required package: timechange
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
stopifnot(period(c(1, 2), units = c("weeks", "weeks")) == weeks(c(1, 2)))
#> Error: period(c(1, 2), units = c("weeks", "weeks")) == weeks(c(1, 2)) are not all TRUE
library(tidyverse)
library(lubridate)
#> Loading required package: timechange
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

data <- data.frame(
  num = c(1,2),
  units = rep("weeks",2)
)

data <- data %>%
  mutate(period = period(num, units),
         period_2 = weeks(num))

stopifnot(data$period == data$period_2)
#> Error: data$period == data$period_2 are not all TRUE
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] lubridate_1.9.0  timechange_0.1.1 furrr_0.3.1      future_1.29.0    forcats_0.5.2    stringr_1.4.1    dplyr_1.0.10     purrr_0.3.5     
 [9] readr_2.1.3      tidyr_1.2.1      tibble_3.1.8     ggplot2_3.4.0    tidyverse_1.3.2 

loaded via a namespace (and not attached):
 [1] listenv_0.8.0       ps_1.7.2            assertthat_0.2.1    digest_0.6.30       utf8_1.2.2          parallelly_1.32.1   R6_2.5.1           
 [8] cellranger_1.1.0    backports_1.4.1     reprex_2.0.2        evaluate_0.18       httr_1.4.4          highr_0.9           pillar_1.8.1       
[15] rlang_1.0.6         googlesheets4_1.0.1 readxl_1.4.1        rstudioapi_0.14     callr_3.7.3         R.utils_2.12.2      R.oo_1.25.0        
[22] styler_1.8.1        rmarkdown_2.18      googledrive_2.0.0   munsell_0.5.0       broom_1.0.1         compiler_4.2.1      modelr_0.1.10      
[29] xfun_0.34           pkgconfig_2.0.3     htmltools_0.5.3     globals_0.16.1      tidyselect_1.2.0    codetools_0.2-18    fansi_1.0.3        
[36] crayon_1.5.2        tzdb_0.3.0          dbplyr_2.2.1        withr_2.5.0         R.methodsS3_1.8.2   grid_4.2.1          jsonlite_1.8.3     
[43] gtable_0.3.1        lifecycle_1.0.3     DBI_1.1.3           magrittr_2.0.3      scales_1.2.1        cli_3.4.1           stringi_1.7.8      
[50] renv_0.16.0         fs_1.5.2            xml2_1.3.3          ellipsis_0.3.2      generics_0.1.3      vctrs_0.5.0         tools_4.2.1        
[57] R.cache_0.16.0      glue_1.6.2          hms_1.1.2           processx_3.8.0      fastmap_1.1.0       parallel_4.2.1      yaml_2.3.6         
[64] colorspace_2.0-3    gargle_1.2.1        rvest_1.0.3         knitr_1.40          haven_2.5.1     
lrossouw commented 1 year ago

This issue https://github.com/tidyverse/lubridate/issues/919 describes the same problem but I think the fix did not address it.

lrossouw commented 1 year ago

Another example of the inconcistency:

library(lubridate)
#> Loading required package: timechange
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# this is fine 
stopifnot(period(months = c(1,2)) ==  months(c(1, 2)))

# but this fails
stopifnot(period(c(1, 2), units = c("months", "months")) == months(c(1, 2)))
#> Error: period(c(1, 2), units = c("months", "months")) == months(c(1,  .... are not all TRUE
lrossouw commented 1 year ago

Another inconsistency when using a character vector below. You will see that using seperate num/units vectors results in a combined period.

library(tidyverse)
library(lubridate)
#> Loading required package: timechange
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# data 
data <- data.frame(
  num = c(1,2,3),
  units = c("weeks", "months", "days")
) %>%
  mutate(period_text = paste(num, units))

# calc 2 ways
data<- data %>% 
  mutate(
    period_1 = period(num, units),
    period_2 = period(period_text)
  )

data
#>   num  units period_text        period_1       period_2
#> 1   1  weeks     1 weeks 2m 10d 0H 0M 0S    7d 0H 0M 0S
#> 2   2 months    2 months 2m 10d 0H 0M 0S 2m 0d 0H 0M 0S
#> 3   3   days      3 days 2m 10d 0H 0M 0S    3d 0H 0M 0S

stopifnot(data$period_1==data$period_2)
#> Error: data$period_1 == data$period_2 are not all TRUE
lrossouw commented 1 year ago

I was about to start coding a fix, but I see the above is essentially expected behaviour based on test cases. For example here it's clearly expected that period combines the periods.

I would say there is some inconsistency with the way it handles the above where it's provided in another format. Should this behaviour not be controlled by some form of collapse parameter similar to the way paste works? And should this not be consistent across different ways in which period can be used?

vspinu commented 1 year ago

This is a very old interface to period and preserved for compatibility. In the beginning period was able to produce only a singletone period. I am afraid we are stuck with it.

Thus you have two options:

> period(paste(c(1, 2), units = c("seconds", "months")))
[1] "1S"             "2m 0d 0H 0M 0S"
> period(seconds = c(1, 0), months = c(0, 2))
[1] "1S"             "2m 0d 0H 0M 0S"
lrossouw commented 1 year ago

Thanks I thought it might be like that after spotting the test cases. Weird inconsistent behaviour. I have gone for the paste version. Maybe a new period function to have consistent behaviour.