weighted interval score function

vpnagraj commented 3 years ago

@chulmelowe i just pushed a few changes to the branch. see commits enumerated in the thread above

these are pkg dev fixes.

the first one (7f6f7ff) creates the .Rd documentation and adds the function to the NAMESPACE. always run devtools::document() prior to pushing

the second one gets rid of the usethis::use_pipe() . you dont need that in there ... you only need to run that once in your workspace when you're developing the package to add the %>% operator to the NAMESPACE (https://github.com/signaturescience/focustools/blob/master/NAMESPACE#L3). once it's there the pipe is available to any package functions.

the last two were fixes per devtools::check(). make sure you check the package before pushing. that was the first thing i did when i pulled down your code. here's what i saw:

── R CMD check results ────────────────────────────── focustools 0.0.0.9000 ────
Duration: 14.3s

> checking R code for possible problems ... NOTE
  wis: no visible binding for global variable ‘target_variable’
  wis: no visible global function definition for ‘n’
  wis: no visible binding for global variable ‘interval_score’
  Undefined global functions or variables:
    interval_score n target_variable

0 errors ✓ | 0 warnings ✓ | 1 note x

lots of solutions to the "no visible binding" message in google:

https://community.rstudio.com/t/how-to-solve-no-visible-binding-for-global-variable-note/28887

https://community.rstudio.com/t/when-programming-with-dplyr-what-is-the-correct-way-to-avoid-undefined-global-variables/55946

you can check the commit history and look at focustools.R on this branch to see how i resolved. after fixed we're looking good with devtools::check():

── R CMD check results ────────────────────────────── focustools 0.0.0.9000 ────
Duration: 15s

0 errors ✓ | 0 warnings ✓ | 0 notes ✓

R CMD check succeeded

so the package dev stuff looks good. but before we merge it would be helpful to see a reproducible example of how you actually use this function. not necessarily for the package documentation (yet) but at the very least dropped in a comment on this thread so i can understand better what this is supposed to do.

i have some stylistic code comments too but i'll save those for after @stephenturner takes a look.

vpnagraj commented 3 years ago

@chulmelowe when you're ready to share some example code drop it in a comment like so:

library(tidyverse)
library(focustools)

## Get national data
national <- inner_join(
  get_cases( source="jhu", granularity="national"),
  get_deaths(source="jhu", granularity="national"),
  by = c("location", "epiyear", "epiweek"))
## Get state data
state <- inner_join(
  get_cases( source="jhu", granularity="state"),
  get_deaths(source="jhu", granularity="state"),
  by = c("location", "epiyear", "epiweek"))
## combine US and state data
usafull <-
  bind_rows(national, state) %>%
  filter(location %in% c("US", stringr::str_pad(1:56, width=2, pad="0"))) %>%
  make_tsibble() %>%
  filter(monday>"2020-03-09")

chulmelowe commented 3 years ago

Here's some example code using the wis function to calculate the weighted interval and relative weighted interval scores:

library(tidyverse)
library(focustools)
library(covidHubUtils)
library(fabletools)
library(fable)

# Read in data for forecasting
usac <- focustools::get_cases()
usad <- focustools::get_deaths()
usa <- dplyr::inner_join(usac, usad, by = c("location", "epiyear", "epiweek")) %>%
  focustools::make_tsibble(chop = TRUE)

# Chop off the last four weeks to use as a scoring data set
dates <- sort(unique(usa$monday), decreasing = TRUE)
train_set <- usa %>%
  filter(monday %in% dates[5:length(dates)])

# Fit models for incident cases and incident deaths
fit_icases <- train_set %>%
  model(arima = ARIMA(icases, stepwise = F, approximation = F))

fit_ideaths <- train_set %>%
  model(linear_caselag3 = TSLM(ideaths ~ lag(icases, 3)))

# Forecast incident cases, incident deaths, and cumulative deaths
forecast_icases <- ts_forecast(fit_icases, outcome = "icases", horizon = 4)

future_cases <- ts_futurecases(train_set, forecast_icases, horizon = 4)
forecast_ideaths <- ts_forecast(fit_ideaths, outcome = "ideaths", new_data = future_cases)

forecast_cdeaths <- ts_forecast(outcome = "cdeaths", .data = train_set, inc_forecast = forecast_ideaths)

# Format the forecasts for submission to the COVID Hub
submission <- list(
  focustools::format_for_submission(forecast_icases, target_name = "inc case"),
  focustools::format_for_submission(forecast_ideaths, target_name = "inc death"),
  focustools::format_for_submission(forecast_cdeaths, target_name = "cum death")
) %>%
  reduce(bind_rows) %>%
  arrange(target)

# Load the formatted truth data
td_ic <- covidHubUtils::load_truth("JHU", target_variable = "inc case")
td_id <- covidHubUtils::load_truth("JHU", target_variable = "inc death")
td_cd <- covidHubUtils::load_truth("JHU", target_variable = "cum death")

 truth_set <- rbind(td_ic, td_id, td_cd)

# Score the forecasts for the last four weeks.
f_scores <- wis(submission, truth_set)
colnames(f_scores)[ncol(f_scores)] <- "sigsci_score"

# Import the baseline model forecasts
base_forecast <- covidHubUtils::load_forecasts(models = "COVIDhub-baseline", forecast_dates = "2021-02-15", locations = "US", source = "zoltar")

# Compute WIS for the baseline model
bl_scores <- wis(base_forecast, truth_set)
colnames(bl_scores)[ncol(bl_scores)] <- "baseline_score"

# Merge the datasets and compute the relative WIS
rel_wis <- f_scores %>%
  left_join(bl_scores) %>%
  mutate(relative_wis = sigsci_score / baseline_score)
rel_wis

vpnagraj commented 3 years ago

@chulmelowe i was able to run through your example code. by the end i'm seeing rel_wis:

target_variable	target_end_date	target	location	sigsci_score	baseline_score	relative_wis
cum death	2021-02-20	1 wk ahead cum death	US	975.3676	5604.321	0.1740385
cum death	2021-02-27	2 wk ahead cum death	US	943.7010	11421.868	0.0826223
cum death	2021-03-06	3 wk ahead cum death	US	1140.2562	18458.353	0.0617745
cum death	2021-03-13	4 wk ahead cum death	US	1485.3530	27528.306	0.0539573
inc case	2021-02-20	1 wk ahead inc case	US	9529.0611	67639.508	0.1408801
inc case	2021-02-27	2 wk ahead inc case	US	17338.8861	52569.713	0.3298265
inc case	2021-03-06	3 wk ahead inc case	US	21317.8083	65661.062	0.3246644
inc case	2021-03-13	4 wk ahead inc case	US	23499.7167	82243.543	0.2857333
inc death	2021-02-20	1 wk ahead inc death	US	975.3676	6881.281	0.1417422
inc death	2021-02-27	2 wk ahead inc death	US	354.9254	5588.525	0.0635097
inc death	2021-03-06	3 wk ahead inc death	US	368.0562	6727.787	0.0547069
inc death	2021-03-13	4 wk ahead inc death	US	353.4968	8714.365	0.0405648

that relative WIS for the SigSci-TS is really low. i think there might be an issue with how the WIS is being calculated ...

to investigate a little further, i tried calculating the WIS using code based on the C19FH scoring script (https://github.com/reichlab/covid19-forecast-evals/blob/main/reports/score_forecasts_eyc_update.R). here's the comparison of the WIS scores:

c19fh_wis	sigsci_wis	horizon	temporal_resolution	target_variable	target_end_date
1434.1904	975.3676	1	wk	cum death	2021-02-20
21136.0143	9529.0611	1	wk	inc case	2021-02-20
1434.1904	975.3676	1	wk	inc death	2021-02-20
1770.9587	943.7010	2	wk	cum death	2021-02-27
39617.7071	17338.8861	2	wk	inc case	2021-02-27
756.9683	354.9254	2	wk	inc death	2021-02-27
2354.5570	1140.2562	3	wk	cum death	2021-03-06
51375.5071	21317.8083	3	wk	inc case	2021-03-06
770.0352	368.0562	3	wk	inc death	2021-03-06
3099.5065	1485.3530	4	wk	cum death	2021-03-13
59398.2714	23499.7167	4	wk	inc case	2021-03-13
754.0800	353.4968	4	wk	inc death	2021-03-13

all of the code to reproduce is available below. wrote that fairly quickly, you may want to double check the joins (you too @stephenturner). note that the original code is largely unchanged, but i did add filter(monday>"2020-03-09") when prepping the data. we're doing that in the submission script now because there are gaps in some locations prior to that date. i also noticed that without that filter there were warnings generated with ARIMA modeling.

last things ... i did have to modify the C19FH code to remove an argument for "count_median_twice" (deprecated? removed altogether from scoringutils?). i also changed the na.rm behavior in the WIS calculation because some of the inc case scores were returned as NA

library(tidyverse)
library(focustools)
library(covidHubUtils)
library(fabletools)
library(fable)

# Read in data for forecasting
usac <- focustools::get_cases()
usad <- focustools::get_deaths()
usa <- dplyr::inner_join(usac, usad, by = c("location", "epiyear", "epiweek")) %>%
  focustools::make_tsibble(chop = TRUE) %>%
  filter(monday>"2020-03-09")

# Chop off the last four weeks to use as a scoring data set
dates <- sort(unique(usa$monday), decreasing = TRUE)
train_set <- usa %>%
  filter(monday %in% dates[5:length(dates)])

# Fit models for incident cases and incident deaths
fit_icases <- train_set %>%
  model(arima = ARIMA(icases, stepwise = F, approximation = F))

fit_ideaths <- train_set %>%
  model(linear_caselag3 = TSLM(ideaths ~ lag(icases, 3)))

# Forecast incident cases, incident deaths, and cumulative deaths
forecast_icases <- ts_forecast(fit_icases, outcome = "icases", horizon = 4)

future_cases <- ts_futurecases(train_set, forecast_icases, horizon = 4)
forecast_ideaths <- ts_forecast(fit_ideaths, outcome = "ideaths", new_data = future_cases)

forecast_cdeaths <- ts_forecast(outcome = "cdeaths", .data = train_set, inc_forecast = forecast_ideaths)

# Format the forecasts for submission to the COVID Hub
submission <- list(
  focustools::format_for_submission(forecast_icases, target_name = "inc case"),
  focustools::format_for_submission(forecast_ideaths, target_name = "inc death"),
  focustools::format_for_submission(forecast_cdeaths, target_name = "cum death")
) %>%
  reduce(bind_rows) %>%
  arrange(target)

# Load the formatted truth data
td_ic <- covidHubUtils::load_truth("JHU", target_variable = "inc case")
td_id <- covidHubUtils::load_truth("JHU", target_variable = "inc death")
td_cd <- covidHubUtils::load_truth("JHU", target_variable = "cum death")

truth_set <- rbind(td_ic, td_id, td_cd)

# Score the forecasts for the last four weeks.
f_scores <- wis(submission, truth_set)
colnames(f_scores)[ncol(f_scores)] <- "sigsci_score"

# Import the baseline model forecasts
base_forecast <- covidHubUtils::load_forecasts(models = "COVIDhub-baseline", forecast_dates = "2021-02-15", locations = "US", source = "zoltar")

# Compute WIS for the baseline model
bl_scores <- wis(base_forecast, truth_set)
colnames(bl_scores)[ncol(bl_scores)] <- "baseline_score"

# Merge the datasets and compute the relative WIS
rel_wis <- f_scores %>%
  left_join(bl_scores) %>%
  mutate(relative_wis = sigsci_score / baseline_score)
rel_wis

##################################################
library(scoringutils)

## now try WIS calculation per C19FH methods
## use same forecast / truth_set inputs
submission <-
  submission %>%
  ## prep the submission object per the scoringutils specs
  ## grab the target name (i.e. "inc case")
  dplyr::mutate(target_variable = gsub(pattern = "[[:digit:]] wk ahead ", replacement = "", x = target)) %>%
  ## add a column for model name
  mutate(model = "SigSci-TS") %>%
  ## get horizon and "wk" separated out
  separate(target, into = c("horizon", "temporal_resolution", "tmp1", "tmp2", "tmp3"), sep = " ") %>%
  select(-tmp1,-tmp2, -tmp3)

## join the submisison to truth_set (defined above)
## clean up columns afterwards
# get dataframe into scoringutil format
joint_df <- dplyr::left_join(x = submission, y = truth_set,
                             by = c("location", "target_variable", "target_end_date")) %>%
  dplyr::select(-c("model.y")) %>%
  dplyr::rename(model = model.x, prediction = value.x, true_value = value.y) %>%
  dplyr::filter(!is.na(true_value))

# score using scoringutil
observation_cols <- c(
  "model",
  "location",
  "horizon", "temporal_resolution", "target_variable",
  "forecast_date", "target_end_date"
)

# creates placeholder variables to store the name of the column from scoringutils::eval_forecasts() to
# take values from (`abs_var`) and the column name to rename as "abs_error" (`abs_var_rename`)
abs_var <- "aem"
abs_var_rename <- "aem_0"

c19fh_scores <- tibble::tibble(scoringutils::eval_forecasts(data = joint_df,
                                                      by = observation_cols,
                                                      summarise_by = c(observation_cols, "range"),
                                                      ## the below interval_score_arguments should ensure that WIS is computed correctly
                                                      interval_score_arguments = list(weigh = TRUE))) %>%
  tidyr::pivot_wider(id_cols = observation_cols,
                     names_from = c("range"),
                     values_from = c("coverage", "interval_score", abs_var, "sharpness", "overprediction", "underprediction")) %>%
  purrr::set_names(~sub(abs_var_rename, "abs_error", .x)) %>%
  ## need to remove all columns ending with NA to not affect WIS calculations
  dplyr::select(
    -dplyr::ends_with("_NA")
  ) %>%
  ## before next lines: do we need to check to ensure interval_score columns exist?
  ## the following lines ensure that we use denominator for the wis of
  ## (# of interval_scores)-0.5
  ## which is written in the paper and elsewhere as
  ## (# of interval_scores at level >0 ) + 0.5 or (K + 1/2)
  ## to make sure that the median only gets half the weight of the other
  ## intervals, multiply its value by 0.5
  dplyr::mutate(
    n_interval_scores = rowSums(!is.na(dplyr::select(., dplyr::starts_with("interval_score")))),
    exists_interval_score_0 = "interval_score_0" %in% names(.),
    interval_score_0 = ifelse(exists_interval_score_0, 0.5 * interval_score_0, NA_real_),
    sharpness_0 = ifelse(exists_interval_score_0, 0.5 * sharpness_0, NA_real_),
    underprediction_0 = ifelse(exists_interval_score_0, 0.5 * underprediction_0, NA_real_),
    overprediction_0 = ifelse(exists_interval_score_0, 0.5 * overprediction_0, NA_real_)) %>%
  dplyr::mutate(
    wis = rowSums(dplyr::select(., dplyr::starts_with("interval_score")), na.rm = TRUE)/(n_interval_scores-0.5*(exists_interval_score_0)),
  ) %>%
  dplyr::mutate(
    sharpness = rowSums(dplyr::select(., dplyr::starts_with("sharpness")), na.rm = FALSE)/(n_interval_scores-0.5*(exists_interval_score_0)),
    overprediction = rowSums(dplyr::select(., dplyr::starts_with("overprediction")), na.rm = FALSE)/(n_interval_scores-0.5*(exists_interval_score_0)),
    underprediction = rowSums(dplyr::select(., dplyr::starts_with("underprediction")), na.rm = FALSE)/(n_interval_scores-0.5*(exists_interval_score_0))
  ) %>%
  dplyr::select(
    -dplyr::starts_with("aem_"),
    -dplyr::starts_with("ae_point_"),
    -dplyr::starts_with("interval_score"),
    -dplyr::starts_with("sharpness_"),
    -dplyr::starts_with("underprediction_"),
    -dplyr::starts_with("overprediction_")
  ) 

## compare the wis computed with wis() vs c19fh wis ...

## first some clean up to f_scores above
f_scores_clean <-
  f_scores %>%
  mutate(target = as.character(target)) %>%
  separate(target, into = c("horizon", "temporal_resolution", "tmp1", "tmp2", "tmp3"), sep = " ") %>%
  select(sigsci_wis = sigsci_score, horizon, temporal_resolution, target_variable, target_end_date)

## now join to cleaned up c19fh scores and situate side-by-side 
c19fh_scores %>%
  select(c19fh_wis = wis, horizon, temporal_resolution, target_variable, target_end_date) %>%
  left_join(f_scores_clean) %>%
  select(c19fh_wis, sigsci_wis, everything())

vpnagraj commented 3 years ago

@chulmelowe given that we're seeing some discrepancies in the our WIS versus the C19FH im not sure it makes sense to maintain our wis() in the package.

i'm going to close this PR without merging ... but the branch will still be up in case we want to revisit wis() later

signaturescience / focustools

weighted interval score function #53