signaturescience / focustools

Forecasting COVID-19 in the US
https://signaturescience.github.io/focustools/
GNU General Public License v3.0
0 stars 0 forks source link

State level forecasts #26

Closed stephenturner closed 3 years ago

stephenturner commented 3 years ago

Add specificity to this memo later. For now, some high level thoughts:

stephenturner commented 3 years ago

Taking a look at this reveals lots of inflexibility throughout. This is going to reopen at least a few issues. Starting with #16, #6, #3 cc @vpnagraj

vpnagraj commented 3 years ago

fair enough. stating the obvious, but make sure any edits go in a new branch so that we keep main clean for next weeks submission

vpnagraj commented 3 years ago

i have the common "location" column in get_* data functions now. seeing a couple of other places i need to make changes for to actually use that in the ts pipeline. will be pushing changes to https://github.com/signaturescience/focustools/tree/state-level-ts

vpnagraj commented 3 years ago

so the "key" concept in fable is 🔮

pull the state-level-ts branch and try this:

## get data at the national scale from jhu source
# usac <-  get_cases(source="jhu",  granularity = "national")
# usad <- get_deaths(source="jhu",  granularity = "national")

## get data at the state scale from jhu source
usac <-  get_cases(source="jhu",  granularity = "state")
usad <- get_deaths(source="jhu",  granularity = "state")

## use the focustools helper to prep the tsibble format
usa <-  
  dplyr::inner_join(usac, usad, by = c("epiyear", "epiweek", "location")) %>% 
  make_tsibble(chop=TRUE)

fit.icases <- usa %>% model(arima = ARIMA(icases, stepwise=FALSE, approximation=FALSE))
fit.ideaths <- usa %>% model(linear_caselag3 = TSLM(ideaths ~ lag(icases, 3)))

## generate incident case forecast
icases_forecast <- ts_forecast(fit.icases, outcome = "icases", horizon = 4)
icases_forecast

## need to get future cases to pass to ideaths forecast
future_cases <- ts_futurecases(usa, icases_forecast, horizon = 4)

ideaths_forecast <- ts_forecast(fit.ideaths,  outcome = "ideaths", new_data = future_cases)
ideaths_forecast

cdeaths_forecast <- ts_forecast(outcome = "cdeaths", .data = usa, inc_forecast = ideaths_forecast)
cdeaths_forecast

submission <-
  list(format_for_submission(icases_forecast, target_name = "inc case"),
       format_for_submission(ideaths_forecast, target_name = "inc death"),
       format_for_submission(cdeaths_forecast, target_name = "cum death")) %>%
  reduce(bind_rows) %>%
  arrange(target)

submission
forecast_date target target_end_date location type quantile value
2021-01-18 1 wk ahead cum death 2021-01-23 Alabama point NA 6346
2021-01-18 1 wk ahead cum death 2021-01-23 Alaska point NA 319
2021-01-18 1 wk ahead cum death 2021-01-23 American Samoa point NA 0
2021-01-18 1 wk ahead cum death 2021-01-23 Arizona point NA 11139
2021-01-18 1 wk ahead cum death 2021-01-23 Arkansas point NA 4503
2021-01-18 1 wk ahead cum death 2021-01-23 California point NA 31579
2021-01-18 1 wk ahead cum death 2021-01-23 Colorado point NA 5365
2021-01-18 1 wk ahead cum death 2021-01-23 Connecticut point NA 6803
2021-01-18 1 wk ahead cum death 2021-01-23 Delaware point NA 1154
2021-01-18 1 wk ahead cum death 2021-01-23 District of Columbia point NA 1396
2021-01-18 1 wk ahead cum death 2021-01-23 Florida point NA 23745
2021-01-18 1 wk ahead cum death 2021-01-23 Georgia point NA 12389
2021-01-18 1 wk ahead cum death 2021-01-23 Guam point NA 420
2021-01-18 1 wk ahead cum death 2021-01-23 Hawaii point NA 640
2021-01-18 1 wk ahead cum death 2021-01-23 Idaho point NA 1788
2021-01-18 1 wk ahead cum death 2021-01-23 Illinois point NA 19533
2021-01-18 1 wk ahead cum death 2021-01-23 Indiana point NA 9044
2021-01-18 1 wk ahead cum death 2021-01-23 Iowa point NA 4463

^ same exact API as our national forecast. this just works with state data now. thanks to a little elbow grease in some of the internals of our functions of course. haven't tried it with counties (i don't think we should ... ) but theoretically it would work for that scale too

still more to do there in terms of prepping the submission format. i think the "location" column needs to be the state codes instead of names? and i'm not sure which territories are supposed to be included. but this puts us pretty far down the path.

stephenturner commented 3 years ago

This is awesome. Running it for the first time now. After a little impromptu hacking to set the date to monday and the negatives to zero (#31) the thing actually validates!

library(tidyverse)
library(focustools)

# Run the forecast pipeline. See ?forecast_pipeline
myforecast <- forecast_pipeline(granularity="state", force=TRUE)

sub <- myforecast$submission 
sub <- 
  sub %>% 
  mutate(value=ifelse(value<0, 0L, value)) %>% 
  mutate(forecast_date=this_monday())

fn <- myforecast$submission_filename

# Look at the submission and the suggested filename.
sub
fn

# Write submission to file
readr::write_csv(sub, file=fn)

# Validate submission
validate_forecast(fn)
* validating quantile_csv_file '/Users/sturner/sigsci/irad/focustools/submission/SigSci-TS/2021-01-20-SigSci-TS.csv'...
$valid
[1] TRUE

$message
[1] "no errors"

Looks like fable threw a few warnings. We should probably look at these sometime. The first one is from the pipeline script. A rank-deficient fit sounds like an insult.

Warning messages:
1: In forecast_pipeline(granularity = "state", force = TRUE) :
  Forcing forecast on a non-Monday. This may not validate. Proceeding...
2: Problem with `mutate()` input `linear_caselag3`.
ℹ prediction from a rank-deficient fit may be misleading
ℹ Input `linear_caselag3` is `(function (object, ...) ...`. 
3: prediction from a rank-deficient fit may be misleading 
stephenturner commented 3 years ago

48 is TX, 51 is VA

image

stephenturner commented 3 years ago

are we good to close this issue out? The plot function - I think it's fine for a sanity check as originally intended in #21 but if we were to release or start to use elsewhere (eg #22) we might do some things like joining against location data to translate the fips to text, and allow the user to pass arguments or somehow adjust for scale. It makes sense to have free y scales for the targets, but would it make sense to have the same scale for each target as defined by the max on that particular scale for one particular location? I don't even know how to do this in ggplot2, may not be worth it.

vpnagraj commented 3 years ago

@stephenturner fyi i realized when i merged the latest PR (#37) i forgot to restore the state-level-ts branch. it's back now for you to commit the hilo stuff:

https://github.com/signaturescience/focustools/tree/state-level-ts