Forecast evaluation: WIS, MAE, coverage

stephenturner commented 2 years ago

We brought over WIS functions in #16 but it might help to have a wrapper to make this easier. Given a formatted forecast (do we need a read_forecast function?), let's either document this or make it easier to use.

Ie

Get real data
Read forecast
evaluate(real, forecast, metric=c("wis", "mae", "coverage")

vpnagraj commented 2 years ago

this intersects with what we need to do move the forecast communication tool forward.

but hold on ...

for WIS we re-implemented evalcast::weighted_interval_score() (https://cmu-delphi.github.io/covidcast/evalcastR/reference/weighted_interval_score.html) as fiphde::weighted_interval_score() and then wrote our own wis_score() which uses that function and also wraps some data manipulation for convenience

@stephenturner im looking back at evalcast and i think we can just use that directly for what we need to do with the forecast communication tool.

in fact, it looks like the package has functionality to do exactly what you're proposing.

i have it working with flu hosp forecasts:

library(evalcast)
library(fiphde)

## get truth data
prepped_hosp <-
  get_hdgov_hosp(limitcols = TRUE) %>%
  prep_hdgov_hosp(statesonly=TRUE, min_per_week = 0, remove_incomplete = TRUE) %>%
  dplyr::filter(abbreviation != "DC") %>%
  left_join(., fiphde:::locations %>% filter(location_name %in% state.name) %>% select(abbreviation, name = location_name))

## need to coerce to make it work with eval function
td <-
  prepped_hosp %>%
  rename(geo_value = location) %>%
  rename(actual = flu.admits)

## get forecast data
tmp_forc <- 
  read_csv("https://raw.githubusercontent.com/cdcepi/Flusight-forecast-data/master/data-forecasts/SigSci-CREG/2022-02-07-SigSci-CREG.csv") %>%
  rename(geo_value = location) %>%
  mutate(forecaster = "SigSci-CREG") %>%
  filter(type == "quantile") %>%
  select(-type) %>%
  mutate(epiweek = lubridate::epiweek(target_end_date),
         epiyear = lubridate::epiyear(target_end_date)) %>%
  select(geo_value, quantile, value, forecaster, forecast_date, target, target_end_date, epiweek, epiyear)

## run evaluate_predictions with a list of error measures
evals <-
  evaluate_predictions(tmp_forc, 
                       truth_data = td, 
                       err_measures = list(wis = weighted_interval_score, 
                                           ae = absolute_error, 
                                           coverage_50 = interval_coverage(0.5), 
                                           coverage_95 = interval_coverage(0.95)), 
                       grp_vars = c("geo_value", "forecaster","forecast_date","target"))

## take a look ...
evals %>%
  select(forecaster, forecast_date, geo_value, target, wis, ae, coverage_50, coverage_95) %>%
  head(10) %>%
  knitr::kable()

forecaster	forecast_date	geo_value	target	wis	ae	coverage_50	coverage_95
SigSci-CREG	2022-02-07	01	1 wk ahead inc flu hosp	10.1391304	15	0	0
SigSci-CREG	2022-02-07	01	2 wk ahead inc flu hosp	4.8239130	8	0	1
SigSci-CREG	2022-02-07	01	3 wk ahead inc flu hosp	10.1673913	15	0	0
SigSci-CREG	2022-02-07	01	4 wk ahead inc flu hosp	2.9026087	6	0	1
SigSci-CREG	2022-02-07	02	1 wk ahead inc flu hosp	1.0808696	2	0	1
SigSci-CREG	2022-02-07	02	2 wk ahead inc flu hosp	1.3352174	2	0	1
SigSci-CREG	2022-02-07	02	3 wk ahead inc flu hosp	0.9556522	2	1	1
SigSci-CREG	2022-02-07	02	4 wk ahead inc flu hosp	1.7608696	3	0	1
SigSci-CREG	2022-02-07	04	1 wk ahead inc flu hosp	15.5278261	30	0	1
SigSci-CREG	2022-02-07	04	2 wk ahead inc flu hosp	18.0804348	34	0	1

with all this in mind im not sure if it makes sense to re-implement in fiphde

other thoughts ??

stephenturner commented 2 years ago

uh, yeah, let's use that.

Looking back at #30 and #25 we had good reasons for getting rid of the evalcast dependency. I'm browsing this code now, I'll try to copy in the relevant stuff from evalcast we'll need for this functionality.

vpnagraj commented 2 years ago

cool. well i guess the question is ... do we need to copy anything into fiphde at all? whats the advantage? i mean, we can add evalcast as a dependency for the FCT app (which is likely to be too complex to be implemented through the fiphde package like explorer app).

and if we want to do any standalone analysis using these metrics we could just load evalcast for evaluation metrics and fiphde for other data prep / modeling functions?

stephenturner commented 2 years ago

Good points. I'd still like to keep some interactive code loading fiphde and evalcast under version control. Where's that best placed? Here in the scratch dir? In submission? In a new ignored evaluation dir? In a separate repo?

vpnagraj commented 2 years ago

nah lets not create a new repo for this. and probably best to keep submission clean.

scratch dir here is fine. or fiphde-auto might be better? i just pushed the code from this thread into fiphde-auto: https://github.com/signaturescience/fiphde-auto/commit/61a928f48ae4cb1cd7f28230bb42a736a26d7bd1

for now the fiphde-auto repo is meant for prototyping / docs. we'll clean it up later. but feel free to push to main or work on a branch if you want to do anything over there

stephenturner commented 2 years ago

Tracking at signaturescience/fiphde-auto#15, see also https://github.com/signaturescience/fiphde-auto/commit/647d4fdde4fd6f5768514bbfa558b9c596456e8a

signaturescience / fiphde

Forecast evaluation: WIS, MAE, coverage #109