signaturescience / fiphde

Forecasting Influenza in Support of Public Health Decision Making
https://signaturescience.github.io/fiphde/
GNU General Public License v3.0
3 stars 1 forks source link

Experimental influenza hospitalization target #136

Closed stephenturner closed 1 year ago

stephenturner commented 1 year ago

Objective: The purpose of introducing an additional experimental target is to provide an opportunity for forecasting teams to submit forecasts for increasing and decreasing activity. While forecasting the magnitude of change during periods of rapid changes in hospitalizations is difficult, it may be possible to reliably forecast the direction of change. If categorical forecasts are shown to be reliable, they would provide valuable information for public health stakeholders.

Proposal: For the 2022-23 influenza season, we define categorical rate changes as follows. State-level changes in hospital admission incidence will be considered in terms of differences on a rate scale (counts per 100k people). Thresholds separating categories of change (e.g., separating “stable” forecasts from “increase” forecasts) will be the same across states, but are translatable into counts using the state’s population size (see locations.csv, in the data-locations subfolder of the Flusight-forecast-data repository). The experimental target, named “2 wk flu hosp rate change” will be submitted as estimates of the probability of occurrence for each rate change category in a new subdirectory of the Flusight-forecast data repository: data-experimental.

Observed changes, for the purpose of evaluating experimental target forecasts, will be determined by final reported values for weekly influenza admissions in the HHS Protect dataset (see FluSight Guidelines for the 2022-23 season for additional details). As such, forecasting teams are encouraged to consider uncertainty in values as they are reported in real time.

Note: This season we will solicit these targets for two-week ahead horizons (i.e., rate difference between week t+2 and current week t). The utility of providing categorical forecasts at other time horizons may be considered in future seasons.

The experimental rate change targets will be reported as probabilities of occurrence and binned into the following categories: stable, increase, decrease, large increase, large decrease, and defined as follows. A forecasted change will be defined as the rate change between the 2-week-ahead and finalized hospitalization rates in the preceding week the forecast were made (i.e., rate_change = yt+2 - yt, where yt denotes the rate of weekly hospitalizations at time t in units of counts/100k). Corresponding count changes are based on state-level population sizes (i.e., count_change = rate_change*state_population / 100,000).

Dates: Experimental target submissions can begin on December 12, 2022, and will run until Monday, May 22, 2023. Participants may start submitting these targets with their standard weekly forecasts by 11PM Eastern Time each Monday (herein referred to as the Forecast Due Date) and can choose to start submitting these targets at any time during the 2022-23 season. Submission of these experimental targets is optional.

Forecast formatting:

Subdirectory: Each model that submits experimental target forecasts will have a unique subdirectory within the GitHub repository where forecasts will be submitted. These subdirectories should be located in the data-experimental subdirectory and must be named in accordance with the guidelines for general forecast submissions. In addition to forecast files (see formatting details below), subdirectories may also contain an optional metadata file or license file. These are particularly encouraged if needed to document any differences from the corresponding files associated with a team’s standard forecasts.

Forecasts

Each experimental target forecast file should have the following format: YYYY-MM-DD-team-model.csv, where

The date YYYY-MM-DD is the forecast_date. This should always be the Monday Forecast Due Date. The team and model in this file must match the team and model in the directory this file is in. Both team and model should be less than 15 characters, alpha-numeric and underscores only, with no spaces or hyphens.

The file must be a comma-separated value (csv) file with the following columns (in any order). No additional columns are allowed. Each row in the file is a point forecast for a location on a particular date for the experimental target.

Forecast submission: Experimental Target Forecasts can be submitted to the CDC EPI GitHub repository (https://github.com/cdcepi/Flusight-forecast-data), and further submission instructions can be found at https://github.com/cdcepi/Flusight-forecast-data/blob/master/data-experimental/README.md

Supplement 1: A csv file is available on GitHub that, for each location, provides a standard population size (from the U.S. Census 2021 Vintage), as well as count thresholds based on the rate boundaries of 1/100,000 and 2/100,000 population. Difference in weekly rates will not be evaluated for the following territories due to small population sizes: American Samoa, Guam, Northern Mariana Islands, and Virgin Islands. https://github.com/cdcepi/Flusight-forecast-data/tree/master/data-locations

Supplement 2: Diagram for direction classifier. Rate_change = yt+2 - yt, where yt denotes the weekly hospitalization rate at time t in units of counts/100k

count_change = rate_change * state_population / 100,000

Image

vpnagraj commented 1 year ago

@stephenturner i think i have this working in #137

if you want a quick look at this without running the entire submission script try this from repo root:

library(tidyverse)
library(fiphde)

forecast_categorical <- function(.forecast,.observed) {

  ## prep the .forecast object for experimental target summary
  forc4exp <-
    .forecast %>%
    dplyr::mutate(value = as.numeric(value)) %>%
    dplyr::mutate(quantile = as.numeric(quantile)) %>%
    ## only looking at 2 week ahead for now
    dplyr::filter(target == "2 wk ahead inc flu hosp") %>%
    ## join to internal locations object that has population data
    dplyr::left_join(fiphde:::locations) %>%
    ## calculate rate per 100k
    dplyr::mutate(rate = (value/population)*100000) %>%
    ## exclude point estimates
    dplyr::filter(type == "quantile") %>%
    ## get columns of interest
    dplyr::select(forecast_date, location, quantile, value, rate)

  hosp4exp <-
    .observed %>%
    ## find observed data that is prior to the 1 week ahead forecast
    dplyr::filter(week_end == min(.forecast$target_end_date) - 7) %>%
    ## join to internal locations object that has population data
    left_join(fiphde:::locations) %>%
    ## calculate rate per 100k
    mutate(lag_rate = (flu.admits/population)*100000) %>%
    ## get columns of interest
    select(location, lag_value = flu.admits, lag_rate)

  ## get "probability range" from each quantile ...
  ## for example: 0.99 and 0.01 quantiles have same prob value (0.01)
  quants <-
    forc4exp %>%
    dplyr::filter(quantile < 0.5) %>%
    dplyr::pull(quantile) %>%
    unique(.)

  quant_denom <-
    c(quants,quants,0.5) %>%
    sum(.)

  ## join prepped forecast and prepped observed
  dplyr::left_join(forc4exp,hosp4exp) %>%
    ## calculate component indicators
    dplyr::mutate(
      ind_count = abs(value - lag_value),
      ind_rate = abs(rate - lag_rate),
      ind_rate2 = ifelse(rate - lag_rate > 0, "positive", "negative")
    ) %>%
    ## use component indicators to assess overall type per CDC flowchart
    dplyr::mutate(type_id =
                    dplyr::case_when(
                      ind_count < 20 | ind_rate < 0.00001 ~ "stable",
                      (ind_count < 40 | ind_rate < 0.00002) & ind_rate2 == "positive" ~ "increase",
                      (ind_count < 40 | ind_rate < 0.00002) & ind_rate2 == "negative" ~ "decrease",
                      (ind_count >= 40 & ind_rate >= 0.00002) & ind_rate2 == "positive" ~ "large_increase",
                      (ind_count >= 40 & ind_rate >= 0.00002) & ind_rate2 == "negative" ~ "large_decrease"
                    )) %>%
    ## convert quantiles to "probability magnitude"
    dplyr::mutate(quantile = ifelse(quantile > 0.5, 1-quantile, quantile)) %>%
    dplyr::group_by(location,type_id) %>%
    ## sum up quantiles as probability magnitude over the total sum of quantiles
    dplyr::summarise(value = sum(quantile)/ (quant_denom), .groups = "drop") %>%
    ## fill in any missing type_ids in a given location with 0
    tidyr::complete(location,type_id, fill = list(value = 0)) %>%
    ## prep the submission format
    dplyr::mutate(forecast_date = unique(.forecast$forecast_date),
                  target = "2 wk flu hosp rate change",
                  type = "category") %>%
    dplyr::select(forecast_date, target,location, type, type_id, value)

}

## alternate example just reading a previously generated forecast file
tsens_forc <- read_csv(here::here("submission/SigSci-TSENS/2022-01-10-SigSci-TSENS.candidate.csv"))
prepped_hosp <- get_hdgov_hosp() %>% prep_hdgov_hosp()
forecast_categorical(tsens_forc, prepped_hosp)

a few things you and i need to do before submitting:

  1. confirm the quantile to probability translation is correct
  2. confirm the translation of flowchart to code is correct
  3. add some code to write out the submission-ready categorical target files

lets chat on this before 2022-12-12 submission.

vpnagraj commented 1 year ago

included in v0.3.2 (#137 )

vpnagraj commented 1 year ago

re-opened this to address an issue with rate change indicator ... it's better to do this using the count threshold for each location. addressed in https://github.com/signaturescience/fiphde/commit/caa2e7b3e75529e8592538de3da55c0eb7166ba4

the function that we're going to add to the package (see #139 ) will include this count threshold for rate change indicators.