tidyverse / lubridate

Make working with dates in R just that little bit easier
https://lubridate.tidyverse.org
GNU General Public License v3.0
728 stars 207 forks source link

Period addition and `update()` don't apply tidyverse recycling rules #1070

Closed DavisVaughan closed 1 year ago

DavisVaughan commented 1 year ago
library(lubridate)

# should be size 0 result
as.Date("2019-01-01") + months(integer())
#> [1] "2019-01-01"

# should not be allowed!
as.Date(c("2019-01-01", "2019-01-02")) + months(1:3)
#> Warning in mt$mon + mos: longer object length is not a multiple of shorter
#> object length
#> Warning in mt$mday != ndays: longer object length is not a multiple of shorter
#> object length
#> Warning in mday(lt) + per@day: longer object length is not a multiple of shorter
#> object length
#> [1] "2019-02-01" "2019-03-02" "2019-04-01"

# like this:
as.Date(c("2019-01-01", "2019-01-02")) + days(1:3)
#> Warning in mday(lt) + per@day: longer object length is not a multiple of shorter
#> object length
#> Error: C_update_dt: length of dt vector must be 1 or match the length of updating vectors
library(lubridate)

# update() tries to follow tidyverse recycling rules
update(as.Date(c("2019-01-01", "2019-01-02")), days = 1:3)
#> Error: C_update_dt: length of dt vector must be 1 or match the length of updating vectors

# but it isn't quite right either (should be an error)
update(as.Date(c("2019-01-01", "2019-01-02")), days = integer())
#> [1] "2019-01-01" "2019-01-02"

# (should be size 0 result b/c we take the common size)
update(as.Date(c("2019-01-01")), days = integer())
#> [1] "2019-01-01"

# proof that we are generally taking the common size:
update(as.Date(c("2019-01-01")), days = 1:3)
#> [1] "2019-01-01" "2019-01-02" "2019-01-03"

Created on 2022-10-05 with reprex v2.0.2

vspinu commented 1 year ago
# but it isn't quite right either (should be an error)
update(as.Date(c("2019-01-01", "2019-01-02")), days = integer())
#> [1] "2019-01-01" "2019-01-02"

# (should be size 0 result b/c we take the common size)
update(as.Date(c("2019-01-01")), days = integer())
#> [1] "2019-01-01"

Why should first one be an error and the second one not?

DavisVaughan commented 1 year ago

The 2nd would be an error if we always recycled to the size of x. i.e. we wouldn't be able to recycle the size of days (0) to the size of x (1).

But we don't, it seems like (for better or worse) update() takes the common size, as seen by:

# proof that we are generally taking the common size:
update(as.Date(c("2019-01-01")), days = 1:3)
#> [1] "2019-01-01" "2019-01-02" "2019-01-03"

so it should obey common size rules, which means that something of size 1 can be recycled to the size of anything else.

So in this example, the size 1 x gets recycled to the size 0 days since the common size of 1 and 0 is 0.

# (should be size 0 result b/c we take the common size)
update(as.Date(c("2019-01-01")), days = integer())
#> [1] "2019-01-01"

vctrs examples:

library(vctrs)

# common size of 1 and 0 is 0 because 1 recycles to size of anything
vec_size_common("x", character())
#> [1] 0

# common size of 3 and 0 is 0 because 3 and 0 are incompatible
vec_size_common(c("x", "y", "z"), character())
#> Error:
#> ! Can't recycle `..1` (size 3) to match `..2` (size 0).
vspinu commented 1 year ago

Fixed in devel. Though the error messages should be improved in the future.

> update(as.Date(c("2019-01-01", "2019-01-02")), days = 1:3)
Error in C_time_update(out, updates, tz, roll_month, roll_dst, week_start,  (from cpp11.R#24) : 
  time_update: length of dt vector must be 1 or match the length of updating vectors
> update(as.Date(c("2019-01-01", "2019-01-02")), days = integer())
Error in C_time_update(out, updates, tz, roll_month, roll_dst, week_start,  (from cpp11.R#24) : 
  time_update: Incompatible size of 'mday' vector
> update(as.Date(c("2019-01-01")), days = integer())
Error in C_time_update(out, updates, tz, roll_month, roll_dst, week_start,  (from cpp11.R#24) : 
  time_update: Incompatible size of 'mday' vector
DavisVaughan commented 1 year ago

What does update(as.Date(c("2019-01-01")), days = 1:3) give? If that works, then this should not have been an error:

> update(as.Date(c("2019-01-01")), days = integer())
Error in C_time_update(out, updates, tz, roll_month, roll_dst, week_start,  (from cpp11.R#24) : 
  time_update: Incompatible size of 'mday' vector
vspinu commented 1 year ago

update(as.Date(c("2019-01-01")), days = 1:3) is not an error of course.

To my mind input vector and updates are not symmetric. To me a more natural approach is to not allow empty updates. Those are likely due to a logical error. An empty input date-time does make sense though.

I am not a stickler to this but replication to the parameter length feels surprising. For instance I would not want day(x) <- integer() to obliterate x. Or do I?

is there a technical issue created somehwere downstream if the replication to 0 parameter length is not followed?

vspinu commented 1 year ago
library(vctrs)

# common size of 1 and 0 is 0 because 1 recycles to size of anything
vec_size_common("x", character())
#> [1] 0

# common size of 3 and 0 is 0 because 3 and 0 are incompatible
vec_size_common(c("x", "y", "z"), character())
#> Error:
#> ! Can't recycle `..1` (size 3) to match `..2` (size 0).

Also the other way around update(empty(), "1 month") would work but update(empty(), c("1m", "2m")) would error. Right?

So unless there is a truly fundamental reason to to follow this rule I would rather keep it asymmetric.