signaturescience / focustools

Forecasting COVID-19 in the US
https://signaturescience.github.io/focustools/
GNU General Public License v3.0
0 stars 0 forks source link

Simple function for sanity check on submission forecast #21

Closed stephenturner closed 3 years ago

stephenturner commented 3 years ago

There's good documentation in the readme about creating a submission-ready forecast, and #20 adds a function to do this in one step closing #16

It'd be nice to have another quick sanity check function to take some of the objects returned by the function added in #20 and produce a couple plots or metrics prior to submission just to check that the results look sane, something like what @vpnagraj showed at https://signaturescience.slack.com/archives/C01FX27J273/p1610050926104400

This is related to #5, as methods used in model evaluation could be used here to sanity check a forecast

stephenturner commented 3 years ago

Some example code hacked together

library(tidyverse)
library(focustools)

myforecast <- forecast_pipeline(source = "jhu")

real <- (myforecast$data) %>% 
  tibble::as_tibble() %>% 
  gather(target, value, icases, ccases, ideaths, cdeaths) %>% 
  mutate(target = target %>% str_remove_all("s$") %>% str_replace_all(c("^i"="inc ", "^c"="cum "))) %>% 
  select(date=monday, target, point=value) %>% 
  mutate(type="recorded") %>% 
  filter(type!="cum case")

forecasted <- myforecast$submission %>% 
  dplyr::filter(type=="point" | quantile==.25 | quantile==.75) %>% 
  dplyr::mutate(quantile=replace_na(quantile, "point")) %>% 
  dplyr::select(-type) %>% 
  separate(target, into=c("nwk", "target"), sep=" wk ahead ") %>% 
  select(date=target_end_date, target, quantile, value) %>% 
  spread(quantile, value) %>% 
  mutate(type="forecast")

bind_rows(real, forecasted) %>% 
  arrange(date) %>% 
  ggplot(aes(date, point)) + geom_line(aes(col=type)) + facet_wrap(~target, scales="free")

image

stephenturner commented 3 years ago

From discussion in https://github.com/signaturescience/focustools/issues/26#issuecomment-763922097

The plot function - I think it's fine for a sanity check as originally intended in #21 but if we were to release or start to use elsewhere (eg #22) we might do some things like joining against location data to translate the fips to text, and allow the user to pass arguments or somehow adjust for scale. It makes sense to have free y scales for the targets, but would it make sense to have the same scale for each target as defined by the max on that particular scale for one particular location? I don't even know how to do this in ggplot2, may not be worth it.