Closed stephenturner closed 1 year ago
@stephenturner i think i have this working in #137
if you want a quick look at this without running the entire submission script try this from repo root:
library(tidyverse)
library(fiphde)
forecast_categorical <- function(.forecast,.observed) {
## prep the .forecast object for experimental target summary
forc4exp <-
.forecast %>%
dplyr::mutate(value = as.numeric(value)) %>%
dplyr::mutate(quantile = as.numeric(quantile)) %>%
## only looking at 2 week ahead for now
dplyr::filter(target == "2 wk ahead inc flu hosp") %>%
## join to internal locations object that has population data
dplyr::left_join(fiphde:::locations) %>%
## calculate rate per 100k
dplyr::mutate(rate = (value/population)*100000) %>%
## exclude point estimates
dplyr::filter(type == "quantile") %>%
## get columns of interest
dplyr::select(forecast_date, location, quantile, value, rate)
hosp4exp <-
.observed %>%
## find observed data that is prior to the 1 week ahead forecast
dplyr::filter(week_end == min(.forecast$target_end_date) - 7) %>%
## join to internal locations object that has population data
left_join(fiphde:::locations) %>%
## calculate rate per 100k
mutate(lag_rate = (flu.admits/population)*100000) %>%
## get columns of interest
select(location, lag_value = flu.admits, lag_rate)
## get "probability range" from each quantile ...
## for example: 0.99 and 0.01 quantiles have same prob value (0.01)
quants <-
forc4exp %>%
dplyr::filter(quantile < 0.5) %>%
dplyr::pull(quantile) %>%
unique(.)
quant_denom <-
c(quants,quants,0.5) %>%
sum(.)
## join prepped forecast and prepped observed
dplyr::left_join(forc4exp,hosp4exp) %>%
## calculate component indicators
dplyr::mutate(
ind_count = abs(value - lag_value),
ind_rate = abs(rate - lag_rate),
ind_rate2 = ifelse(rate - lag_rate > 0, "positive", "negative")
) %>%
## use component indicators to assess overall type per CDC flowchart
dplyr::mutate(type_id =
dplyr::case_when(
ind_count < 20 | ind_rate < 0.00001 ~ "stable",
(ind_count < 40 | ind_rate < 0.00002) & ind_rate2 == "positive" ~ "increase",
(ind_count < 40 | ind_rate < 0.00002) & ind_rate2 == "negative" ~ "decrease",
(ind_count >= 40 & ind_rate >= 0.00002) & ind_rate2 == "positive" ~ "large_increase",
(ind_count >= 40 & ind_rate >= 0.00002) & ind_rate2 == "negative" ~ "large_decrease"
)) %>%
## convert quantiles to "probability magnitude"
dplyr::mutate(quantile = ifelse(quantile > 0.5, 1-quantile, quantile)) %>%
dplyr::group_by(location,type_id) %>%
## sum up quantiles as probability magnitude over the total sum of quantiles
dplyr::summarise(value = sum(quantile)/ (quant_denom), .groups = "drop") %>%
## fill in any missing type_ids in a given location with 0
tidyr::complete(location,type_id, fill = list(value = 0)) %>%
## prep the submission format
dplyr::mutate(forecast_date = unique(.forecast$forecast_date),
target = "2 wk flu hosp rate change",
type = "category") %>%
dplyr::select(forecast_date, target,location, type, type_id, value)
}
## alternate example just reading a previously generated forecast file
tsens_forc <- read_csv(here::here("submission/SigSci-TSENS/2022-01-10-SigSci-TSENS.candidate.csv"))
prepped_hosp <- get_hdgov_hosp() %>% prep_hdgov_hosp()
forecast_categorical(tsens_forc, prepped_hosp)
a few things you and i need to do before submitting:
lets chat on this before 2022-12-12 submission.
included in v0.3.2 (#137 )
re-opened this to address an issue with rate change indicator ... it's better to do this using the count threshold for each location. addressed in https://github.com/signaturescience/fiphde/commit/caa2e7b3e75529e8592538de3da55c0eb7166ba4
the function that we're going to add to the package (see #139 ) will include this count threshold for rate change indicators.
Objective: The purpose of introducing an additional experimental target is to provide an opportunity for forecasting teams to submit forecasts for increasing and decreasing activity. While forecasting the magnitude of change during periods of rapid changes in hospitalizations is difficult, it may be possible to reliably forecast the direction of change. If categorical forecasts are shown to be reliable, they would provide valuable information for public health stakeholders.
Proposal: For the 2022-23 influenza season, we define categorical rate changes as follows. State-level changes in hospital admission incidence will be considered in terms of differences on a rate scale (counts per 100k people). Thresholds separating categories of change (e.g., separating “stable” forecasts from “increase” forecasts) will be the same across states, but are translatable into counts using the state’s population size (see locations.csv, in the data-locations subfolder of the Flusight-forecast-data repository). The experimental target, named “2 wk flu hosp rate change” will be submitted as estimates of the probability of occurrence for each rate change category in a new subdirectory of the Flusight-forecast data repository: data-experimental.
Observed changes, for the purpose of evaluating experimental target forecasts, will be determined by final reported values for weekly influenza admissions in the HHS Protect dataset (see FluSight Guidelines for the 2022-23 season for additional details). As such, forecasting teams are encouraged to consider uncertainty in values as they are reported in real time.
Note: This season we will solicit these targets for two-week ahead horizons (i.e., rate difference between week t+2 and current week t). The utility of providing categorical forecasts at other time horizons may be considered in future seasons.
The experimental rate change targets will be reported as probabilities of occurrence and binned into the following categories: stable, increase, decrease, large increase, large decrease, and defined as follows. A forecasted change will be defined as the rate change between the 2-week-ahead and finalized hospitalization rates in the preceding week the forecast were made (i.e., rate_change = yt+2 - yt, where yt denotes the rate of weekly hospitalizations at time t in units of counts/100k). Corresponding count changes are based on state-level population sizes (i.e., count_change = rate_change*state_population / 100,000).
Dates: Experimental target submissions can begin on December 12, 2022, and will run until Monday, May 22, 2023. Participants may start submitting these targets with their standard weekly forecasts by 11PM Eastern Time each Monday (herein referred to as the Forecast Due Date) and can choose to start submitting these targets at any time during the 2022-23 season. Submission of these experimental targets is optional.
Forecast formatting:
Subdirectory: Each model that submits experimental target forecasts will have a unique subdirectory within the GitHub repository where forecasts will be submitted. These subdirectories should be located in the data-experimental subdirectory and must be named in accordance with the guidelines for general forecast submissions. In addition to forecast files (see formatting details below), subdirectories may also contain an optional metadata file or license file. These are particularly encouraged if needed to document any differences from the corresponding files associated with a team’s standard forecasts.
Forecasts
Each experimental target forecast file should have the following format:
YYYY-MM-DD-team-model.csv
, whereThe date YYYY-MM-DD is the forecast_date. This should always be the Monday Forecast Due Date. The team and model in this file must match the team and model in the directory this file is in. Both team and model should be less than 15 characters, alpha-numeric and underscores only, with no spaces or hyphens.
The file must be a comma-separated value (csv) file with the following columns (in any order). No additional columns are allowed. Each row in the file is a point forecast for a location on a particular date for the experimental target.
forecast_date
: Values in the forecast_date column must be a date in the ISO format YYYY-MM-DD. This is the Forecast Due Date for the submission and will always match the weekly date of standard forecast submissions. forecast_date should correspond and be redundant with the date in the filename but is included here to facilitate validation and analysis.target
: Values in the target column must be a character (string): “2 wk flu hosp rate change”.location
: Values in the location column must be one of the “locations” in this FIPS numeric code file which includes numeric FIPS codes for U.S. states, territories, and districts, as well as “US” for national forecasts. Please note that when writing FIPS codes, they should be written in as a character string to preserve any leading zeroes.type
: Values in the type column are “category” for this experimental target. This value indicates that the row corresponds to a categorical forecast.type_id
: Values in the type_id column must be a character (string) indicating the experimental rate change category, formatted as follows:stable
,increase
,decrease
,large_increase
,large_decrease
.value
: Values in the value column are probabilities (non-negative numbers that are greater than or equal to 0 and less than or equal to 1) indicating the estimated probability for the category specified in thetype_id
column for this row.Forecast submission: Experimental Target Forecasts can be submitted to the CDC EPI GitHub repository (https://github.com/cdcepi/Flusight-forecast-data), and further submission instructions can be found at https://github.com/cdcepi/Flusight-forecast-data/blob/master/data-experimental/README.md
Supplement 1: A csv file is available on GitHub that, for each location, provides a standard population size (from the U.S. Census 2021 Vintage), as well as count thresholds based on the rate boundaries of 1/100,000 and 2/100,000 population. Difference in weekly rates will not be evaluated for the following territories due to small population sizes: American Samoa, Guam, Northern Mariana Islands, and Virgin Islands. https://github.com/cdcepi/Flusight-forecast-data/tree/master/data-locations
Supplement 2: Diagram for direction classifier. Rate_change = yt+2 - yt, where yt denotes the weekly hospitalization rate at time t in units of counts/100k
count_change = rate_change * state_population / 100,000