Closed zhongwei-yao closed 3 years ago
You have run into a very common issue with lubridate which I call invalid dates.
You can see that for 2000-03-30 - months(1)
, an invalid date of 2000-02-30
would be generated. Because this doesn't exist, lubridate defaults to returning NA
.
library(lubridate)
library(slider)
x <- as.Date(c(
"2000-02-29", "2000-03-01", "2000-03-02", "2000-03-03", "2000-03-06",
"2000-03-07", "2000-03-08", "2000-03-09", "2000-03-10", "2000-03-13",
"2000-03-14", "2000-03-15", "2000-03-16", "2000-03-17", "2000-03-20",
"2000-03-21", "2000-03-22", "2000-03-23", "2000-03-24", "2000-03-27",
"2000-03-28", "2000-03-29", "2000-03-30"
))
x_tail <- tail(x)
x_tail
#> [1] "2000-03-23" "2000-03-24" "2000-03-27" "2000-03-28" "2000-03-29"
#> [6] "2000-03-30"
# Generates NA values on invalid dates
x_tail - months(1)
#> [1] "2000-02-23" "2000-02-24" "2000-02-27" "2000-02-28" "2000-02-29"
#> [6] NA
This is often a frustrating behavior that can happen silently. It has come up many times in lubridate issues! For this specific issue, a lubridate solution is to use %m-%
rather than -
. For invalid dates like this one, it will instead roll backwards to the previous valid date (2000-02-29, the end of that month). This is reasonable behavior. You end up with two 2000-02-29
values, but that is the price we have to pay for dealing with irregular calendrical data.
# Handles invalid dates by choosing the previous valid date
x_tail %m-% months(1)
#> [1] "2000-02-23" "2000-02-24" "2000-02-27" "2000-02-28" "2000-02-29"
#> [6] "2000-02-29"
Ideally you'd be able to do:
slide_index(x, x, ~.x, .before = ~.x %m-% months(1), .complete = FALSE)
but I haven't added anonymous function .before
values to slider yet. I plan to eventually.
Instead you can get fairly close to that by manually specifying starts/stops with hop_index()
. You can create the .starts
with %m-%
. (There is no .complete
argument here, so it isn't exactly the same)
# Manually generate and specify start/stop values
starts <- x %m-% months(1)
stops <- x
results <- hop_index(
.x = x,
.i = x,
.starts = starts,
.stops = stops,
.f = ~.x
)
tail(results, 3)
#> [[1]]
#> [1] "2000-02-29" "2000-03-01" "2000-03-02" "2000-03-03" "2000-03-06"
#> [6] "2000-03-07" "2000-03-08" "2000-03-09" "2000-03-10" "2000-03-13"
#> [11] "2000-03-14" "2000-03-15" "2000-03-16" "2000-03-17" "2000-03-20"
#> [16] "2000-03-21" "2000-03-22" "2000-03-23" "2000-03-24" "2000-03-27"
#> [21] "2000-03-28"
#>
#> [[2]]
#> [1] "2000-02-29" "2000-03-01" "2000-03-02" "2000-03-03" "2000-03-06"
#> [6] "2000-03-07" "2000-03-08" "2000-03-09" "2000-03-10" "2000-03-13"
#> [11] "2000-03-14" "2000-03-15" "2000-03-16" "2000-03-17" "2000-03-20"
#> [16] "2000-03-21" "2000-03-22" "2000-03-23" "2000-03-24" "2000-03-27"
#> [21] "2000-03-28" "2000-03-29"
#>
#> [[3]]
#> [1] "2000-02-29" "2000-03-01" "2000-03-02" "2000-03-03" "2000-03-06"
#> [6] "2000-03-07" "2000-03-08" "2000-03-09" "2000-03-10" "2000-03-13"
#> [11] "2000-03-14" "2000-03-15" "2000-03-16" "2000-03-17" "2000-03-20"
#> [16] "2000-03-21" "2000-03-22" "2000-03-23" "2000-03-24" "2000-03-27"
#> [21] "2000-03-28" "2000-03-29" "2000-03-30"
I'll also add that I've been working on a new package called clock, and one of its goals is to make handling invalid dates a little less frustrating by being more verbose about when they occur, and by giving you more tools to handle them. It isn't on CRAN yet, but here is a teaser:
You immediately get an error as soon as you hit invalid date issues when using clock::add_months()
library(clock)
x <- as.Date(c(
"2000-02-29", "2000-03-01", "2000-03-02", "2000-03-03", "2000-03-06",
"2000-03-07", "2000-03-08", "2000-03-09", "2000-03-10", "2000-03-13",
"2000-03-14", "2000-03-15", "2000-03-16", "2000-03-17", "2000-03-20",
"2000-03-21", "2000-03-22", "2000-03-23", "2000-03-24", "2000-03-27",
"2000-03-28", "2000-03-29", "2000-03-30"
))
x_tail <- tail(x)
x_tail
#> [1] "2000-03-23" "2000-03-24" "2000-03-27" "2000-03-28" "2000-03-29"
#> [6] "2000-03-30"
add_months(x_tail, -1)
#> Error: Invalid date found at location 6. Resolve invalid date issues by specifying the `invalid` argument.
You have multiple ways to resolve these invalid dates by using the invalid
argument. Here are a few of the many options:
# Previous valid moment in time (Sort of like `%m-%`)
add_months(x_tail, -1, invalid = "previous")
#> [1] "2000-02-23" "2000-02-24" "2000-02-27" "2000-02-28" "2000-02-29"
#> [6] "2000-02-29"
# Next valid moment in time
add_months(x_tail, -1, invalid = "next")
#> [1] "2000-02-23" "2000-02-24" "2000-02-27" "2000-02-28" "2000-02-29"
#> [6] "2000-03-01"
# Lubridate `- month(1)` behavior
add_months(x_tail, -1, invalid = "NA")
#> [1] "2000-02-23" "2000-02-24" "2000-02-27" "2000-02-28" "2000-02-29"
#> [6] NA
This is all part of the high level API for clock. There are all lower level types, such as year-month-day, that have the powerful ability of being able to represent these invalid dates directly.
ymd <- as_year_month_day(x_tail)
ymd
#> <year_month_day<day>[6]>
#> [1] "2000-03-23" "2000-03-24" "2000-03-27" "2000-03-28" "2000-03-29"
#> [6] "2000-03-30"
ymd_invalid <- add_months(ymd, -1)
# 1 invalid date, 2000-02-30
ymd_invalid
#> <year_month_day<day>[invalid=1][6]>
#> [1] "2000-02-23" "2000-02-24" "2000-02-27" "2000-02-28" "2000-02-29"
#> [6] "2000-02-30"
At this point you could resolve those invalid dates with invalid_resolve(invalid = )
, which has the same options as where I specified invalid
in add_months()
, or you could leave it alone if you are going to be doing additional manipulations on that year-month-day type that might resolve the invalid dates automatically.
(I know this is closed, but I just ran into this error and don't know whether anything is wrong, per se.)
I got this error when using the vic_elec
dataset from tsibbledata. I'm not precisely sure why--the timestamps are clean and in order. I managed to solve it by converting the time zone to UTC instead of "Australia/Melbourne", so maybe it has to do with leap years or legal changes to the time or something?
Odd, anyway. Maybe more explanation in the error would help users?
library(tidyverse)
table(diff(tsibbledata::vic_elec$Time), useNA = 'ifany')
#>
#> 30
#> 52607
tsibbledata::vic_elec %>%
mutate(
demand_1d_mean = slider::slide_index_dbl(
.x = Demand,
.i = Time,
.f = mean,
.before = lubridate::days(1)
)
)
#> Error: Problem with `mutate()` column `demand_1d_mean`.
#> i `demand_1d_mean = slider::slide_index_dbl(.x = Demand, .i = Time, .f = mean, .before = lubridate::days(1))`.
#> x Endpoints generated by `.before` cannot be `NA`.
#> i They are `NA` at locations: 13493, 13494, 30965, 30966,....
lubridate::tz(tsibbledata::vic_elec$Time)
#> [1] "Australia/Melbourne"
tsibbledata::vic_elec %>%
select(-Temperature, -Holiday, -Date) %>%
mutate(
utc_time = lubridate::with_tz(Time, 'UTC'),
demand_1d_mean = slider::slide_index_dbl(
.x = Demand,
.i = utc_time,
.f = mean,
.before = lubridate::days(1)
)
)
#> # A tibble: 52,608 x 4
#> Time Demand utc_time demand_1d_mean
#> <dttm> <dbl> <dttm> <dbl>
#> 1 2012-01-01 00:00:00 4383. 2011-12-31 13:00:00 4383.
#> 2 2012-01-01 00:30:00 4263. 2011-12-31 13:30:00 4323.
#> 3 2012-01-01 01:00:00 4049. 2011-12-31 14:00:00 4232.
#> 4 2012-01-01 01:30:00 3878. 2011-12-31 14:30:00 4143.
#> 5 2012-01-01 02:00:00 4036. 2011-12-31 15:00:00 4122.
#> 6 2012-01-01 02:30:00 3866. 2011-12-31 15:30:00 4079.
#> 7 2012-01-01 03:00:00 3694. 2011-12-31 16:00:00 4024.
#> 8 2012-01-01 03:30:00 3562. 2011-12-31 16:30:00 3966.
#> 9 2012-01-01 04:00:00 3433. 2011-12-31 17:00:00 3907.
#> 10 2012-01-01 04:30:00 3359. 2011-12-31 17:30:00 3852.
#> # ... with 52,598 more rows
It has to do with daylight saving time, see this example:
library(tsibbledata)
library(lubridate)
time <- vic_elec$Time
time[13493]
#> [1] "2012-10-08 02:00:00 AEDT"
# This returns NA silently
time[13493] - days(1)
#> [1] NA
# Going back 2 days works
time[13493] - days(2)
#> [1] "2012-10-06 02:00:00 AEST"
# There is a DST gap on 2012-10-07, where clocks went from
# 01:59:59 -> 03:00:00, skipping the 2 o'clock hour entirely
right_before_gap <- time[13493] - (days(1) + seconds(1))
right_before_gap
#> [1] "2012-10-07 01:59:59 AEST"
right_before_gap + 1
#> [1] "2012-10-07 03:00:00 AEDT"
# So the theoretical time of "2012-10-07 02:00:00" doesn't exist
# in the Melbourne time zone, and you silently get an NA
I think the easiest and most correct solution in this case is to just use ddays()
rather than days()
, which won't ever produce an NA
result.
library(tsibbledata)
library(lubridate)
time <- vic_elec$Time
time[13493]
#> [1] "2012-10-08 02:00:00 AEDT"
# Goes back 24 hours. Like you waited around for exactly
# 86,400 seconds and then looked at the clock and this is what
# you saw
time[13493] - ddays(1)
#> [1] "2012-10-07 01:00:00 AEST"
library(tidyverse)
library(tsibbledata)
library(slider)
library(lubridate)
vic_elec %>%
mutate(
demand_1d_mean = slide_index_dbl(
.x = Demand,
.i = Time,
.f = mean,
.before = ddays(1)
)
)
#> # A tibble: 52,608 × 6
#> Time Demand Temperature Date Holiday demand_1d_mean
#> <dttm> <dbl> <dbl> <date> <lgl> <dbl>
#> 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE 4383.
#> 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE 4323.
#> 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE 4232.
#> 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE 4143.
#> 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE 4122.
#> 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE 4079.
#> 7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE 4024.
#> 8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE 3966.
#> 9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE 3907.
#> 10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE 3852.
#> # … with 52,598 more rows
Thank you for the awesome package! I run into an error and have no idea how to fix it.