If we run a forecast on Monday where the previous week is incomplete, we'll run into a problem because the previous week will be removed, and our forecast horizons won't truly be horizon weeks long.
We can get around this issue by setting remove_incomplete=FALSE in prep, but this will make the last week's worth of data appear lower, which could cause problems with a time series approach.
Possible start to a solution implemented below.
We could determine what the last date of the incomplete epiweek is, then fill in the missing values separately for each state using the most recent nonmissing value (demonstrated here). Alternatively we could use a median/mean of the last n days (not demonstrated here, but not too difficult to implement).
One thing to consider here is if running this code on a Monday, if we actually have data on the previous Sunday starting that epiweek, then this would perform this "imputation" to the Saturday of the current week, into the future.
Opening this with some breadcrumbs to come back to should we find this to be a problem again in coming weeks.
suppressPackageStartupMessages({
library(tidyverse)
library(fiphde)
})
hdgov_hosp %>%
filter(state=="CA" | state=="NC") %>%
filter(date>="2022-03-13" & date <="2022-03-17") %>%
select(state:flu.admits.cov) %>%
clipr::write_clip()
#> Error in error_interactive(): To run write_clip() in non-interactive mode, either call write_clip() with allow_non_interactive = TRUE, or set the environment variable CLIPR_ALLOW=TRUE
hdgov_hosp <- tibble::tribble(
~state, ~date, ~flu.admits, ~flu.admits.cov,
"CA", "2022-03-13", 7L, 367L,
"NC", "2022-03-13", 4L, 116L,
"CA", "2022-03-14", 10L, 368L,
"NC", "2022-03-14", 1L, 115L,
"CA", "2022-03-15", 5L, 402L,
"NC", "2022-03-15", 0L, 125L,
"CA", "2022-03-16", 11L, 404L,
"NC", "2022-03-16", 3L, 125L,
"CA", "2022-03-17", 4L, 404L,
"NC", "2022-03-17", 2L, 124L
)
hdgov_hosp
#> # A tibble: 10 × 4
#> state date flu.admits flu.admits.cov
#> <chr> <chr> <int> <int>
#> 1 CA 2022-03-13 7 367
#> 2 NC 2022-03-13 4 116
#> 3 CA 2022-03-14 10 368
#> 4 NC 2022-03-14 1 115
#> 5 CA 2022-03-15 5 402
#> 6 NC 2022-03-15 0 125
#> 7 CA 2022-03-16 11 404
#> 8 NC 2022-03-16 3 125
#> 9 CA 2022-03-17 4 404
#> 10 NC 2022-03-17 2 124
last_date <- max(hdgov_hosp$date)
last_date
#> [1] "2022-03-17"
last_epi <- MMWRweek::MMWRweek(last_date)
last_epi
#> MMWRyear MMWRweek MMWRday
#> 1 2022 11 5
last_saturday <- MMWRweek::MMWRweek2Date(last_epi$MMWRyear, last_epi$MMWRweek, 7)
last_saturday
#> [1] "2022-03-19"
# issue a warning
if (last_date!=last_saturday) {
warning(sprintf("Last day of data (%s) isn't last date of that epiweek (%s)", last_date, last_saturday))
}
#> Warning: Last day of data (2022-03-17) isn't last date of that epiweek
#> (2022-03-19)
# do stuff, e.g.: if (fill_epiweek=TRUE) {...}
if (last_date!=last_saturday) {
}
#> NULL
new_dates <- seq.Date(from=as.Date(last_date)+1, to=as.Date(last_saturday), by="days")
new_dates
#> [1] "2022-03-18" "2022-03-19"
# dnm = Data with New dates Missing
dnm <- crossing(state=unique(hdgov_hosp$state), date=as.character(new_dates)) %>%
full_join(hdgov_hosp, .) %>%
arrange(date, state)
#> Joining, by = c("state", "date")
dnm
#> # A tibble: 14 × 4
#> state date flu.admits flu.admits.cov
#> <chr> <chr> <int> <int>
#> 1 CA 2022-03-13 7 367
#> 2 NC 2022-03-13 4 116
#> 3 CA 2022-03-14 10 368
#> 4 NC 2022-03-14 1 115
#> 5 CA 2022-03-15 5 402
#> 6 NC 2022-03-15 0 125
#> 7 CA 2022-03-16 11 404
#> 8 NC 2022-03-16 3 125
#> 9 CA 2022-03-17 4 404
#> 10 NC 2022-03-17 2 124
#> 11 CA 2022-03-18 NA NA
#> 12 NC 2022-03-18 NA NA
#> 13 CA 2022-03-19 NA NA
#> 14 NC 2022-03-19 NA NA
# fill with most recent value
filled_down <-
dnm %>%
group_by(state) %>%
tidyr::fill(starts_with("flu"), starts_with("cov"), .direction = "down")
filled_down
#> # A tibble: 14 × 4
#> # Groups: state [2]
#> state date flu.admits flu.admits.cov
#> <chr> <chr> <int> <int>
#> 1 CA 2022-03-13 7 367
#> 2 NC 2022-03-13 4 116
#> 3 CA 2022-03-14 10 368
#> 4 NC 2022-03-14 1 115
#> 5 CA 2022-03-15 5 402
#> 6 NC 2022-03-15 0 125
#> 7 CA 2022-03-16 11 404
#> 8 NC 2022-03-16 3 125
#> 9 CA 2022-03-17 4 404
#> 10 NC 2022-03-17 2 124
#> 11 CA 2022-03-18 4 404
#> 12 NC 2022-03-18 2 124
#> 13 CA 2022-03-19 4 404
#> 14 NC 2022-03-19 2 124
# fill with mean of that week
# ???
any kind of imputation will introduce bias. after seeing the reporting issue resolved this week, i say we close this issue for now. if we need to revisit we definitely can.
If we run a forecast on Monday where the previous week is incomplete, we'll run into a problem because the previous week will be removed, and our forecast horizons won't truly be
horizon
weeks long.We can get around this issue by setting
remove_incomplete=FALSE
in prep, but this will make the last week's worth of data appear lower, which could cause problems with a time series approach.Possible start to a solution implemented below.
We could determine what the last date of the incomplete epiweek is, then fill in the missing values separately for each state using the most recent nonmissing value (demonstrated here). Alternatively we could use a median/mean of the last n days (not demonstrated here, but not too difficult to implement).
One thing to consider here is if running this code on a Monday, if we actually have data on the previous Sunday starting that epiweek, then this would perform this "imputation" to the Saturday of the current week, into the future.
Opening this with some breadcrumbs to come back to should we find this to be a problem again in coming weeks.