State level forecasts - Githubissues

stephenturner commented 3 years ago

Add specificity to this memo later. For now, some high level thoughts:

Develop code to get data (restrict to VA and TX at first for speed of dev/testing?)
See how current functionality would work using state as a key in the tsibble
May need to modify make_tsibble function to accept a column as a key
How does a key column work with group_by?

stephenturner commented 3 years ago

Taking a look at this reveals lots of inflexibility throughout. This is going to reopen at least a few issues. Starting with #16, #6, #3 cc @vpnagraj

vpnagraj commented 3 years ago

fair enough. stating the obvious, but make sure any edits go in a new branch so that we keep main clean for next weeks submission

vpnagraj commented 3 years ago

i have the common "location" column in get_* data functions now. seeing a couple of other places i need to make changes for to actually use that in the ts pipeline. will be pushing changes to https://github.com/signaturescience/focustools/tree/state-level-ts

vpnagraj commented 3 years ago

so the "key" concept in fable is 🔮

pull the state-level-ts branch and try this:

## get data at the national scale from jhu source
# usac <-  get_cases(source="jhu",  granularity = "national")
# usad <- get_deaths(source="jhu",  granularity = "national")

## get data at the state scale from jhu source
usac <-  get_cases(source="jhu",  granularity = "state")
usad <- get_deaths(source="jhu",  granularity = "state")

## use the focustools helper to prep the tsibble format
usa <-  
  dplyr::inner_join(usac, usad, by = c("epiyear", "epiweek", "location")) %>% 
  make_tsibble(chop=TRUE)

fit.icases <- usa %>% model(arima = ARIMA(icases, stepwise=FALSE, approximation=FALSE))
fit.ideaths <- usa %>% model(linear_caselag3 = TSLM(ideaths ~ lag(icases, 3)))

## generate incident case forecast
icases_forecast <- ts_forecast(fit.icases, outcome = "icases", horizon = 4)
icases_forecast

## need to get future cases to pass to ideaths forecast
future_cases <- ts_futurecases(usa, icases_forecast, horizon = 4)

ideaths_forecast <- ts_forecast(fit.ideaths,  outcome = "ideaths", new_data = future_cases)
ideaths_forecast

cdeaths_forecast <- ts_forecast(outcome = "cdeaths", .data = usa, inc_forecast = ideaths_forecast)
cdeaths_forecast

submission <-
  list(format_for_submission(icases_forecast, target_name = "inc case"),
       format_for_submission(ideaths_forecast, target_name = "inc death"),
       format_for_submission(cdeaths_forecast, target_name = "cum death")) %>%
  reduce(bind_rows) %>%
  arrange(target)

submission

forecast_date	target	target_end_date	location	type	quantile	value
2021-01-18	1 wk ahead cum death	2021-01-23	Alabama	point	NA	6346
2021-01-18	1 wk ahead cum death	2021-01-23	Alaska	point	NA	319
2021-01-18	1 wk ahead cum death	2021-01-23	American Samoa	point	NA	0
2021-01-18	1 wk ahead cum death	2021-01-23	Arizona	point	NA	11139
2021-01-18	1 wk ahead cum death	2021-01-23	Arkansas	point	NA	4503
2021-01-18	1 wk ahead cum death	2021-01-23	California	point	NA	31579
2021-01-18	1 wk ahead cum death	2021-01-23	Colorado	point	NA	5365
2021-01-18	1 wk ahead cum death	2021-01-23	Connecticut	point	NA	6803
2021-01-18	1 wk ahead cum death	2021-01-23	Delaware	point	NA	1154
2021-01-18	1 wk ahead cum death	2021-01-23	District of Columbia	point	NA	1396
2021-01-18	1 wk ahead cum death	2021-01-23	Florida	point	NA	23745
2021-01-18	1 wk ahead cum death	2021-01-23	Georgia	point	NA	12389
2021-01-18	1 wk ahead cum death	2021-01-23	Guam	point	NA	420
2021-01-18	1 wk ahead cum death	2021-01-23	Hawaii	point	NA	640
2021-01-18	1 wk ahead cum death	2021-01-23	Idaho	point	NA	1788
2021-01-18	1 wk ahead cum death	2021-01-23	Illinois	point	NA	19533
2021-01-18	1 wk ahead cum death	2021-01-23	Indiana	point	NA	9044
2021-01-18	1 wk ahead cum death	2021-01-23	Iowa	point	NA	4463

^ same exact API as our national forecast. this just works with state data now. thanks to a little elbow grease in some of the internals of our functions of course. haven't tried it with counties (i don't think we should ... ) but theoretically it would work for that scale too

still more to do there in terms of prepping the submission format. i think the "location" column needs to be the state codes instead of names? and i'm not sure which territories are supposed to be included. but this puts us pretty far down the path.

stephenturner commented 3 years ago

This is awesome. Running it for the first time now. After a little impromptu hacking to set the date to monday and the negatives to zero (#31) the thing actually validates!

library(tidyverse)
library(focustools)

# Run the forecast pipeline. See ?forecast_pipeline
myforecast <- forecast_pipeline(granularity="state", force=TRUE)

sub <- myforecast$submission 
sub <- 
  sub %>% 
  mutate(value=ifelse(value<0, 0L, value)) %>% 
  mutate(forecast_date=this_monday())

fn <- myforecast$submission_filename

# Look at the submission and the suggested filename.
sub
fn

# Write submission to file
readr::write_csv(sub, file=fn)

# Validate submission
validate_forecast(fn)

* validating quantile_csv_file '/Users/sturner/sigsci/irad/focustools/submission/SigSci-TS/2021-01-20-SigSci-TS.csv'...
$valid
[1] TRUE

$message
[1] "no errors"

Looks like fable threw a few warnings. We should probably look at these sometime. The first one is from the pipeline script. A rank-deficient fit sounds like an insult.

Warning messages:
1: In forecast_pipeline(granularity = "state", force = TRUE) :
  Forcing forecast on a non-Monday. This may not validate. Proceeding...
2: Problem with `mutate()` input `linear_caselag3`.
ℹ prediction from a rank-deficient fit may be misleading
ℹ Input `linear_caselag3` is `(function (object, ...) ...`. 
3: prediction from a rank-deficient fit may be misleading

stephenturner commented 3 years ago

48 is TX, 51 is VA

stephenturner commented 3 years ago

are we good to close this issue out? The plot function - I think it's fine for a sanity check as originally intended in #21 but if we were to release or start to use elsewhere (eg #22) we might do some things like joining against location data to translate the fips to text, and allow the user to pass arguments or somehow adjust for scale. It makes sense to have free y scales for the targets, but would it make sense to have the same scale for each target as defined by the max on that particular scale for one particular location? I don't even know how to do this in ggplot2, may not be worth it.

vpnagraj commented 3 years ago

@stephenturner fyi i realized when i merged the latest PR (#37) i forgot to restore the state-level-ts branch. it's back now for you to commit the hilo stuff:

https://github.com/signaturescience/focustools/tree/state-level-ts

signaturescience / focustools

State level forecasts #26