signaturescience / focustools

Forecasting COVID-19 in the US
https://signaturescience.github.io/focustools/
GNU General Public License v3.0
0 stars 0 forks source link

weighted interval score function #53

Closed vpnagraj closed 3 years ago

vpnagraj commented 3 years ago

@chulmelowe i just pushed a few changes to the branch. see commits enumerated in the thread above

these are pkg dev fixes.

the first one (7f6f7ff) creates the .Rd documentation and adds the function to the NAMESPACE. always run devtools::document() prior to pushing

the second one gets rid of the usethis::use_pipe() . you dont need that in there ... you only need to run that once in your workspace when you're developing the package to add the %>% operator to the NAMESPACE (https://github.com/signaturescience/focustools/blob/master/NAMESPACE#L3). once it's there the pipe is available to any package functions.

the last two were fixes per devtools::check(). make sure you check the package before pushing. that was the first thing i did when i pulled down your code. here's what i saw:

── R CMD check results ────────────────────────────── focustools 0.0.0.9000 ────
Duration: 14.3s

> checking R code for possible problems ... NOTE
  wis: no visible binding for global variable ‘target_variable’
  wis: no visible global function definition for ‘n’
  wis: no visible binding for global variable ‘interval_score’
  Undefined global functions or variables:
    interval_score n target_variable

0 errors ✓ | 0 warnings ✓ | 1 note x

lots of solutions to the "no visible binding" message in google:

https://community.rstudio.com/t/how-to-solve-no-visible-binding-for-global-variable-note/28887

https://community.rstudio.com/t/when-programming-with-dplyr-what-is-the-correct-way-to-avoid-undefined-global-variables/55946

you can check the commit history and look at focustools.R on this branch to see how i resolved. after fixed we're looking good with devtools::check():

── R CMD check results ────────────────────────────── focustools 0.0.0.9000 ────
Duration: 15s

0 errors ✓ | 0 warnings ✓ | 0 notes ✓

R CMD check succeeded

so the package dev stuff looks good. but before we merge it would be helpful to see a reproducible example of how you actually use this function. not necessarily for the package documentation (yet) but at the very least dropped in a comment on this thread so i can understand better what this is supposed to do.

i have some stylistic code comments too but i'll save those for after @stephenturner takes a look.

vpnagraj commented 3 years ago

@chulmelowe when you're ready to share some example code drop it in a comment like so:

library(tidyverse)
library(focustools)

## Get national data
national <- inner_join(
  get_cases( source="jhu", granularity="national"),
  get_deaths(source="jhu", granularity="national"),
  by = c("location", "epiyear", "epiweek"))
## Get state data
state <- inner_join(
  get_cases( source="jhu", granularity="state"),
  get_deaths(source="jhu", granularity="state"),
  by = c("location", "epiyear", "epiweek"))
## combine US and state data
usafull <-
  bind_rows(national, state) %>%
  filter(location %in% c("US", stringr::str_pad(1:56, width=2, pad="0"))) %>%
  make_tsibble() %>%
  filter(monday>"2020-03-09")
chulmelowe commented 3 years ago

Here's some example code using the wis function to calculate the weighted interval and relative weighted interval scores:

library(tidyverse)
library(focustools)
library(covidHubUtils)
library(fabletools)
library(fable)

# Read in data for forecasting
usac <- focustools::get_cases()
usad <- focustools::get_deaths()
usa <- dplyr::inner_join(usac, usad, by = c("location", "epiyear", "epiweek")) %>%
  focustools::make_tsibble(chop = TRUE)

# Chop off the last four weeks to use as a scoring data set
dates <- sort(unique(usa$monday), decreasing = TRUE)
train_set <- usa %>%
  filter(monday %in% dates[5:length(dates)])

# Fit models for incident cases and incident deaths
fit_icases <- train_set %>%
  model(arima = ARIMA(icases, stepwise = F, approximation = F))

fit_ideaths <- train_set %>%
  model(linear_caselag3 = TSLM(ideaths ~ lag(icases, 3)))

# Forecast incident cases, incident deaths, and cumulative deaths
forecast_icases <- ts_forecast(fit_icases, outcome = "icases", horizon = 4)

future_cases <- ts_futurecases(train_set, forecast_icases, horizon = 4)
forecast_ideaths <- ts_forecast(fit_ideaths, outcome = "ideaths", new_data = future_cases)

forecast_cdeaths <- ts_forecast(outcome = "cdeaths", .data = train_set, inc_forecast = forecast_ideaths)

# Format the forecasts for submission to the COVID Hub
submission <- list(
  focustools::format_for_submission(forecast_icases, target_name = "inc case"),
  focustools::format_for_submission(forecast_ideaths, target_name = "inc death"),
  focustools::format_for_submission(forecast_cdeaths, target_name = "cum death")
) %>%
  reduce(bind_rows) %>%
  arrange(target)

# Load the formatted truth data
td_ic <- covidHubUtils::load_truth("JHU", target_variable = "inc case")
td_id <- covidHubUtils::load_truth("JHU", target_variable = "inc death")
td_cd <- covidHubUtils::load_truth("JHU", target_variable = "cum death")

 truth_set <- rbind(td_ic, td_id, td_cd)

# Score the forecasts for the last four weeks.
f_scores <- wis(submission, truth_set)
colnames(f_scores)[ncol(f_scores)] <- "sigsci_score"

# Import the baseline model forecasts
base_forecast <- covidHubUtils::load_forecasts(models = "COVIDhub-baseline", forecast_dates = "2021-02-15", locations = "US", source = "zoltar")

# Compute WIS for the baseline model
bl_scores <- wis(base_forecast, truth_set)
colnames(bl_scores)[ncol(bl_scores)] <- "baseline_score"

# Merge the datasets and compute the relative WIS
rel_wis <- f_scores %>%
  left_join(bl_scores) %>%
  mutate(relative_wis = sigsci_score / baseline_score)
rel_wis
vpnagraj commented 3 years ago

@chulmelowe i was able to run through your example code. by the end i'm seeing rel_wis:

target_variable target_end_date target location sigsci_score baseline_score relative_wis
cum death 2021-02-20 1 wk ahead cum death US 975.3676 5604.321 0.1740385
cum death 2021-02-27 2 wk ahead cum death US 943.7010 11421.868 0.0826223
cum death 2021-03-06 3 wk ahead cum death US 1140.2562 18458.353 0.0617745
cum death 2021-03-13 4 wk ahead cum death US 1485.3530 27528.306 0.0539573
inc case 2021-02-20 1 wk ahead inc case US 9529.0611 67639.508 0.1408801
inc case 2021-02-27 2 wk ahead inc case US 17338.8861 52569.713 0.3298265
inc case 2021-03-06 3 wk ahead inc case US 21317.8083 65661.062 0.3246644
inc case 2021-03-13 4 wk ahead inc case US 23499.7167 82243.543 0.2857333
inc death 2021-02-20 1 wk ahead inc death US 975.3676 6881.281 0.1417422
inc death 2021-02-27 2 wk ahead inc death US 354.9254 5588.525 0.0635097
inc death 2021-03-06 3 wk ahead inc death US 368.0562 6727.787 0.0547069
inc death 2021-03-13 4 wk ahead inc death US 353.4968 8714.365 0.0405648

that relative WIS for the SigSci-TS is really low. i think there might be an issue with how the WIS is being calculated ...

to investigate a little further, i tried calculating the WIS using code based on the C19FH scoring script (https://github.com/reichlab/covid19-forecast-evals/blob/main/reports/score_forecasts_eyc_update.R). here's the comparison of the WIS scores:

c19fh_wis sigsci_wis horizon temporal_resolution target_variable target_end_date
1434.1904 975.3676 1 wk cum death 2021-02-20
21136.0143 9529.0611 1 wk inc case 2021-02-20
1434.1904 975.3676 1 wk inc death 2021-02-20
1770.9587 943.7010 2 wk cum death 2021-02-27
39617.7071 17338.8861 2 wk inc case 2021-02-27
756.9683 354.9254 2 wk inc death 2021-02-27
2354.5570 1140.2562 3 wk cum death 2021-03-06
51375.5071 21317.8083 3 wk inc case 2021-03-06
770.0352 368.0562 3 wk inc death 2021-03-06
3099.5065 1485.3530 4 wk cum death 2021-03-13
59398.2714 23499.7167 4 wk inc case 2021-03-13
754.0800 353.4968 4 wk inc death 2021-03-13

all of the code to reproduce is available below. wrote that fairly quickly, you may want to double check the joins (you too @stephenturner). note that the original code is largely unchanged, but i did add filter(monday>"2020-03-09") when prepping the data. we're doing that in the submission script now because there are gaps in some locations prior to that date. i also noticed that without that filter there were warnings generated with ARIMA modeling.

last things ... i did have to modify the C19FH code to remove an argument for "count_median_twice" (deprecated? removed altogether from scoringutils?). i also changed the na.rm behavior in the WIS calculation because some of the inc case scores were returned as NA

library(tidyverse)
library(focustools)
library(covidHubUtils)
library(fabletools)
library(fable)

# Read in data for forecasting
usac <- focustools::get_cases()
usad <- focustools::get_deaths()
usa <- dplyr::inner_join(usac, usad, by = c("location", "epiyear", "epiweek")) %>%
  focustools::make_tsibble(chop = TRUE) %>%
  filter(monday>"2020-03-09")

# Chop off the last four weeks to use as a scoring data set
dates <- sort(unique(usa$monday), decreasing = TRUE)
train_set <- usa %>%
  filter(monday %in% dates[5:length(dates)])

# Fit models for incident cases and incident deaths
fit_icases <- train_set %>%
  model(arima = ARIMA(icases, stepwise = F, approximation = F))

fit_ideaths <- train_set %>%
  model(linear_caselag3 = TSLM(ideaths ~ lag(icases, 3)))

# Forecast incident cases, incident deaths, and cumulative deaths
forecast_icases <- ts_forecast(fit_icases, outcome = "icases", horizon = 4)

future_cases <- ts_futurecases(train_set, forecast_icases, horizon = 4)
forecast_ideaths <- ts_forecast(fit_ideaths, outcome = "ideaths", new_data = future_cases)

forecast_cdeaths <- ts_forecast(outcome = "cdeaths", .data = train_set, inc_forecast = forecast_ideaths)

# Format the forecasts for submission to the COVID Hub
submission <- list(
  focustools::format_for_submission(forecast_icases, target_name = "inc case"),
  focustools::format_for_submission(forecast_ideaths, target_name = "inc death"),
  focustools::format_for_submission(forecast_cdeaths, target_name = "cum death")
) %>%
  reduce(bind_rows) %>%
  arrange(target)

# Load the formatted truth data
td_ic <- covidHubUtils::load_truth("JHU", target_variable = "inc case")
td_id <- covidHubUtils::load_truth("JHU", target_variable = "inc death")
td_cd <- covidHubUtils::load_truth("JHU", target_variable = "cum death")

truth_set <- rbind(td_ic, td_id, td_cd)

# Score the forecasts for the last four weeks.
f_scores <- wis(submission, truth_set)
colnames(f_scores)[ncol(f_scores)] <- "sigsci_score"

# Import the baseline model forecasts
base_forecast <- covidHubUtils::load_forecasts(models = "COVIDhub-baseline", forecast_dates = "2021-02-15", locations = "US", source = "zoltar")

# Compute WIS for the baseline model
bl_scores <- wis(base_forecast, truth_set)
colnames(bl_scores)[ncol(bl_scores)] <- "baseline_score"

# Merge the datasets and compute the relative WIS
rel_wis <- f_scores %>%
  left_join(bl_scores) %>%
  mutate(relative_wis = sigsci_score / baseline_score)
rel_wis

##################################################
library(scoringutils)

## now try WIS calculation per C19FH methods
## use same forecast / truth_set inputs
submission <-
  submission %>%
  ## prep the submission object per the scoringutils specs
  ## grab the target name (i.e. "inc case")
  dplyr::mutate(target_variable = gsub(pattern = "[[:digit:]] wk ahead ", replacement = "", x = target)) %>%
  ## add a column for model name
  mutate(model = "SigSci-TS") %>%
  ## get horizon and "wk" separated out
  separate(target, into = c("horizon", "temporal_resolution", "tmp1", "tmp2", "tmp3"), sep = " ") %>%
  select(-tmp1,-tmp2, -tmp3)

## join the submisison to truth_set (defined above)
## clean up columns afterwards
# get dataframe into scoringutil format
joint_df <- dplyr::left_join(x = submission, y = truth_set,
                             by = c("location", "target_variable", "target_end_date")) %>%
  dplyr::select(-c("model.y")) %>%
  dplyr::rename(model = model.x, prediction = value.x, true_value = value.y) %>%
  dplyr::filter(!is.na(true_value))

# score using scoringutil
observation_cols <- c(
  "model",
  "location",
  "horizon", "temporal_resolution", "target_variable",
  "forecast_date", "target_end_date"
)

# creates placeholder variables to store the name of the column from scoringutils::eval_forecasts() to
# take values from (`abs_var`) and the column name to rename as "abs_error" (`abs_var_rename`)
abs_var <- "aem"
abs_var_rename <- "aem_0"

c19fh_scores <- tibble::tibble(scoringutils::eval_forecasts(data = joint_df,
                                                      by = observation_cols,
                                                      summarise_by = c(observation_cols, "range"),
                                                      ## the below interval_score_arguments should ensure that WIS is computed correctly
                                                      interval_score_arguments = list(weigh = TRUE))) %>%
  tidyr::pivot_wider(id_cols = observation_cols,
                     names_from = c("range"),
                     values_from = c("coverage", "interval_score", abs_var, "sharpness", "overprediction", "underprediction")) %>%
  purrr::set_names(~sub(abs_var_rename, "abs_error", .x)) %>%
  ## need to remove all columns ending with NA to not affect WIS calculations
  dplyr::select(
    -dplyr::ends_with("_NA")
  ) %>%
  ## before next lines: do we need to check to ensure interval_score columns exist?
  ## the following lines ensure that we use denominator for the wis of
  ## (# of interval_scores)-0.5
  ## which is written in the paper and elsewhere as
  ## (# of interval_scores at level >0 ) + 0.5 or (K + 1/2)
  ## to make sure that the median only gets half the weight of the other
  ## intervals, multiply its value by 0.5
  dplyr::mutate(
    n_interval_scores = rowSums(!is.na(dplyr::select(., dplyr::starts_with("interval_score")))),
    exists_interval_score_0 = "interval_score_0" %in% names(.),
    interval_score_0 = ifelse(exists_interval_score_0, 0.5 * interval_score_0, NA_real_),
    sharpness_0 = ifelse(exists_interval_score_0, 0.5 * sharpness_0, NA_real_),
    underprediction_0 = ifelse(exists_interval_score_0, 0.5 * underprediction_0, NA_real_),
    overprediction_0 = ifelse(exists_interval_score_0, 0.5 * overprediction_0, NA_real_)) %>%
  dplyr::mutate(
    wis = rowSums(dplyr::select(., dplyr::starts_with("interval_score")), na.rm = TRUE)/(n_interval_scores-0.5*(exists_interval_score_0)),
  ) %>%
  dplyr::mutate(
    sharpness = rowSums(dplyr::select(., dplyr::starts_with("sharpness")), na.rm = FALSE)/(n_interval_scores-0.5*(exists_interval_score_0)),
    overprediction = rowSums(dplyr::select(., dplyr::starts_with("overprediction")), na.rm = FALSE)/(n_interval_scores-0.5*(exists_interval_score_0)),
    underprediction = rowSums(dplyr::select(., dplyr::starts_with("underprediction")), na.rm = FALSE)/(n_interval_scores-0.5*(exists_interval_score_0))
  ) %>%
  dplyr::select(
    -dplyr::starts_with("aem_"),
    -dplyr::starts_with("ae_point_"),
    -dplyr::starts_with("interval_score"),
    -dplyr::starts_with("sharpness_"),
    -dplyr::starts_with("underprediction_"),
    -dplyr::starts_with("overprediction_")
  ) 

## compare the wis computed with wis() vs c19fh wis ...

## first some clean up to f_scores above
f_scores_clean <-
  f_scores %>%
  mutate(target = as.character(target)) %>%
  separate(target, into = c("horizon", "temporal_resolution", "tmp1", "tmp2", "tmp3"), sep = " ") %>%
  select(sigsci_wis = sigsci_score, horizon, temporal_resolution, target_variable, target_end_date)

## now join to cleaned up c19fh scores and situate side-by-side 
c19fh_scores %>%
  select(c19fh_wis = wis, horizon, temporal_resolution, target_variable, target_end_date) %>%
  left_join(f_scores_clean) %>%
  select(c19fh_wis, sigsci_wis, everything())
vpnagraj commented 3 years ago

@chulmelowe given that we're seeing some discrepancies in the our WIS versus the C19FH im not sure it makes sense to maintain our wis() in the package.

i'm going to close this PR without merging ... but the branch will still be up in case we want to revisit wis() later