Experimental influenza hospitalization target

Objective: The purpose of introducing an additional experimental target is to provide an opportunity for forecasting teams to submit forecasts for increasing and decreasing activity. While forecasting the magnitude of change during periods of rapid changes in hospitalizations is difficult, it may be possible to reliably forecast the direction of change. If categorical forecasts are shown to be reliable, they would provide valuable information for public health stakeholders.

Proposal: For the 2022-23 influenza season, we define categorical rate changes as follows. State-level changes in hospital admission incidence will be considered in terms of differences on a rate scale (counts per 100k people). Thresholds separating categories of change (e.g., separating “stable” forecasts from “increase” forecasts) will be the same across states, but are translatable into counts using the state’s population size (see locations.csv, in the data-locations subfolder of the Flusight-forecast-data repository). The experimental target, named “2 wk flu hosp rate change” will be submitted as estimates of the probability of occurrence for each rate change category in a new subdirectory of the Flusight-forecast data repository: data-experimental.

Observed changes, for the purpose of evaluating experimental target forecasts, will be determined by final reported values for weekly influenza admissions in the HHS Protect dataset (see FluSight Guidelines for the 2022-23 season for additional details). As such, forecasting teams are encouraged to consider uncertainty in values as they are reported in real time.

Note: This season we will solicit these targets for two-week ahead horizons (i.e., rate difference between week t+2 and current week t). The utility of providing categorical forecasts at other time horizons may be considered in future seasons.

The experimental rate change targets will be reported as probabilities of occurrence and binned into the following categories: stable, increase, decrease, large increase, large decrease, and defined as follows. A forecasted change will be defined as the rate change between the 2-week-ahead and finalized hospitalization rates in the preceding week the forecast were made (i.e., rate_change = yt+2 - yt, where yt denotes the rate of weekly hospitalizations at time t in units of counts/100k). Corresponding count changes are based on state-level population sizes (i.e., count_change = rate_change*state_population / 100,000).

Stable: forecasted changes in hospitalizations qualify as stable if either the magnitude of the rate change is less than 1/100k OR the corresponding magnitude of the count change is less than 20.
Increase: positive forecasted changes that do not qualify as stable and for which the forecasted rate change is less than 2/100k OR the magnitude of the count change is less than 40.
Large increase: positive forecasted rate changes that are larger than or equal to 2/100k, AND the count change is larger than or equal to 40.
Decrease : negative forecasted rate changes that do not qualify as stable and for which the forecasted rate change is less than 2/100k OR the magnitude of the count change is less than 40.
Large decrease: negative forecasted rate changes that have a magnitude larger than or equal to 2/100k, AND the magnitude of count change is larger than or equal to 40.

Dates: Experimental target submissions can begin on December 12, 2022, and will run until Monday, May 22, 2023. Participants may start submitting these targets with their standard weekly forecasts by 11PM Eastern Time each Monday (herein referred to as the Forecast Due Date) and can choose to start submitting these targets at any time during the 2022-23 season. Submission of these experimental targets is optional.

Forecast formatting:

Subdirectory: Each model that submits experimental target forecasts will have a unique subdirectory within the GitHub repository where forecasts will be submitted. These subdirectories should be located in the data-experimental subdirectory and must be named in accordance with the guidelines for general forecast submissions. In addition to forecast files (see formatting details below), subdirectories may also contain an optional metadata file or license file. These are particularly encouraged if needed to document any differences from the corresponding files associated with a team’s standard forecasts.

Forecasts

Each experimental target forecast file should have the following format: YYYY-MM-DD-team-model.csv, where

YYYY is the 4 digit year,
MM is the 2 digit month,
DD is the 2 digit day,
team is the teamname, and
model is the name of your model.

The date YYYY-MM-DD is the forecast_date. This should always be the Monday Forecast Due Date. The team and model in this file must match the team and model in the directory this file is in. Both team and model should be less than 15 characters, alpha-numeric and underscores only, with no spaces or hyphens.

The file must be a comma-separated value (csv) file with the following columns (in any order). No additional columns are allowed. Each row in the file is a point forecast for a location on a particular date for the experimental target.

forecast_date: Values in the forecast_date column must be a date in the ISO format YYYY-MM-DD. This is the Forecast Due Date for the submission and will always match the weekly date of standard forecast submissions. forecast_date should correspond and be redundant with the date in the filename but is included here to facilitate validation and analysis.
target: Values in the target column must be a character (string): “2 wk flu hosp rate change”.
location: Values in the location column must be one of the “locations” in this FIPS numeric code file which includes numeric FIPS codes for U.S. states, territories, and districts, as well as “US” for national forecasts. Please note that when writing FIPS codes, they should be written in as a character string to preserve any leading zeroes.
type: Values in the type column are “category” for this experimental target. This value indicates that the row corresponds to a categorical forecast.
type_id: Values in the type_id column must be a character (string) indicating the experimental rate change category, formatted as follows: stable, increase, decrease, large_increase, large_decrease.
value: Values in the value column are probabilities (non-negative numbers that are greater than or equal to 0 and less than or equal to 1) indicating the estimated probability for the category specified in the type_id column for this row.

Forecast submission: Experimental Target Forecasts can be submitted to the CDC EPI GitHub repository (https://github.com/cdcepi/Flusight-forecast-data), and further submission instructions can be found at https://github.com/cdcepi/Flusight-forecast-data/blob/master/data-experimental/README.md

Supplement 1: A csv file is available on GitHub that, for each location, provides a standard population size (from the U.S. Census 2021 Vintage), as well as count thresholds based on the rate boundaries of 1/100,000 and 2/100,000 population. Difference in weekly rates will not be evaluated for the following territories due to small population sizes: American Samoa, Guam, Northern Mariana Islands, and Virgin Islands. https://github.com/cdcepi/Flusight-forecast-data/tree/master/data-locations

Supplement 2: Diagram for direction classifier. Rate_change = yt+2 - yt, where yt denotes the weekly hospitalization rate at time t in units of counts/100k

count_change = rate_change * state_population / 100,000

@stephenturner i think i have this working in #137

if you want a quick look at this without running the entire submission script try this from repo root:

library(tidyverse)
library(fiphde)

forecast_categorical <- function(.forecast,.observed) {

  ## prep the .forecast object for experimental target summary
  forc4exp <-
    .forecast %>%
    dplyr::mutate(value = as.numeric(value)) %>%
    dplyr::mutate(quantile = as.numeric(quantile)) %>%
    ## only looking at 2 week ahead for now
    dplyr::filter(target == "2 wk ahead inc flu hosp") %>%
    ## join to internal locations object that has population data
    dplyr::left_join(fiphde:::locations) %>%
    ## calculate rate per 100k
    dplyr::mutate(rate = (value/population)*100000) %>%
    ## exclude point estimates
    dplyr::filter(type == "quantile") %>%
    ## get columns of interest
    dplyr::select(forecast_date, location, quantile, value, rate)

  hosp4exp <-
    .observed %>%
    ## find observed data that is prior to the 1 week ahead forecast
    dplyr::filter(week_end == min(.forecast$target_end_date) - 7) %>%
    ## join to internal locations object that has population data
    left_join(fiphde:::locations) %>%
    ## calculate rate per 100k
    mutate(lag_rate = (flu.admits/population)*100000) %>%
    ## get columns of interest
    select(location, lag_value = flu.admits, lag_rate)

  ## get "probability range" from each quantile ...
  ## for example: 0.99 and 0.01 quantiles have same prob value (0.01)
  quants <-
    forc4exp %>%
    dplyr::filter(quantile < 0.5) %>%
    dplyr::pull(quantile) %>%
    unique(.)

  quant_denom <-
    c(quants,quants,0.5) %>%
    sum(.)

  ## join prepped forecast and prepped observed
  dplyr::left_join(forc4exp,hosp4exp) %>%
    ## calculate component indicators
    dplyr::mutate(
      ind_count = abs(value - lag_value),
      ind_rate = abs(rate - lag_rate),
      ind_rate2 = ifelse(rate - lag_rate > 0, "positive", "negative")
    ) %>%
    ## use component indicators to assess overall type per CDC flowchart
    dplyr::mutate(type_id =
                    dplyr::case_when(
                      ind_count < 20 | ind_rate < 0.00001 ~ "stable",
                      (ind_count < 40 | ind_rate < 0.00002) & ind_rate2 == "positive" ~ "increase",
                      (ind_count < 40 | ind_rate < 0.00002) & ind_rate2 == "negative" ~ "decrease",
                      (ind_count >= 40 & ind_rate >= 0.00002) & ind_rate2 == "positive" ~ "large_increase",
                      (ind_count >= 40 & ind_rate >= 0.00002) & ind_rate2 == "negative" ~ "large_decrease"
                    )) %>%
    ## convert quantiles to "probability magnitude"
    dplyr::mutate(quantile = ifelse(quantile > 0.5, 1-quantile, quantile)) %>%
    dplyr::group_by(location,type_id) %>%
    ## sum up quantiles as probability magnitude over the total sum of quantiles
    dplyr::summarise(value = sum(quantile)/ (quant_denom), .groups = "drop") %>%
    ## fill in any missing type_ids in a given location with 0
    tidyr::complete(location,type_id, fill = list(value = 0)) %>%
    ## prep the submission format
    dplyr::mutate(forecast_date = unique(.forecast$forecast_date),
                  target = "2 wk flu hosp rate change",
                  type = "category") %>%
    dplyr::select(forecast_date, target,location, type, type_id, value)

}

## alternate example just reading a previously generated forecast file
tsens_forc <- read_csv(here::here("submission/SigSci-TSENS/2022-01-10-SigSci-TSENS.candidate.csv"))
prepped_hosp <- get_hdgov_hosp() %>% prep_hdgov_hosp()
forecast_categorical(tsens_forc, prepped_hosp)

a few things you and i need to do before submitting:

confirm the quantile to probability translation is correct
confirm the translation of flowchart to code is correct
add some code to write out the submission-ready categorical target files

lets chat on this before 2022-12-12 submission.

signaturescience / fiphde

Experimental influenza hospitalization target #136