signaturescience / focustools

Forecasting COVID-19 in the US
https://signaturescience.github.io/focustools/
GNU General Public License v3.0
0 stars 0 forks source link

translate point estimates to quantile #4

Closed vpnagraj closed 3 years ago

vpnagraj commented 3 years ago

the covid 19 forecast hub requires point estimates and pre-determined quantile values for each target.

the quantile values required are enumerated in the technical readme:

https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/README.md#quantile

once we decide on an initial model / targets to pursue, we will need establish how to go about translating a point estimate to quantiles.

vpnagraj commented 3 years ago

a little more detail on one possible approach here ...

may be a naive way of thinking about it, but what if we used the point estimate from a model prediction along with its standard error to seed a simulated distribution of predicted values? to do so, we would have to assume the predicted values follow a certain distribution ... but from there we could cut the simulated values at relevant quantiles

below is some code to do that with a linear model of faithful data under the assumption that the predicted value follows a normal distribution:

## fit the model
eruptfit <- lm(eruptions ~ waiting, data = faithful)

## get the prediction interval
## from prediction interval calculate standard error
pred <- 
  predict(eruptfit, data.frame(waiting = 80), level = 0.95, interval = "predict") %>%
  as.data.frame() %>%
  dplyr::mutate(se = (upr - fit) / qt(0.975, nrow(faithful)))

## sidebar ... is this standard error formula correct?
## see https://stackoverflow.com/questions/38109501/how-does-predict-lm-compute-confidence-interval-and-prediction-interval

## what is standard error for CI?
tmp <- predict(eruptfit, data.frame(waiting = 80), level = 0.95, interval = "predict", se.fit = TRUE)
tmp$se.fit
[1] 0.03625177
## use standard error for CI to calculate standard error for PI
sqrt(tmp$se.fit ^ 2 + tmp$residual.scale ^ 2)
[1] 0.4978346
## what whas the se for the PI per our calculation?
pred$se
[1] 0.4978511
## assume distribution of forecast values follows normal distribution ...
## mean = prediction
## sd = standard error of prediction
## "simulate" values by generate random normal distribution
sims <- rnorm(1000, mean = pred$fit, sd = pred$se)

## cut the simulated values into the quantiles of interest
quantile(sims, probs = c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99))
      1%     2.5%       5%      10%      15%      20%      25%      30%      35%      40%      45%      50%      55%      60%      65% 
3.001737 3.149908 3.346475 3.560254 3.672461 3.768598 3.843439 3.925468 3.986631 4.069986 4.132211 4.175931 4.251500 4.288355 4.344637 
     70%      75%      80%      85%      90%      95%    97.5%      99% 
4.407399 4.499772 4.603745 4.719678 4.833324 5.030727 5.146453 5.357499 
stephenturner commented 3 years ago

So the 50th percentile here, the median -- how does that relate to the point estimate?

stephenturner commented 3 years ago

Few other approaches in the fable framework:

  1. See fable bookdown section on simulation: https://tidyverts.github.io/tidy-forecasting-principles/methods.html#simulation
  2. See about halfway down, the hilo() function.

The resulting forecasts are contained in a “fable” (forecast table), and both point forecasts and forecast distributions are available in the table for the next five years. Confidence intervals can be extracted from the distribution using the hilo() function.

The hilo() function can also be used on fable objects, which allows you to extract multiple intervals at once.

vpnagraj commented 3 years ago

So the 50th percentile here, the median -- how does that relate to the point estimate?

good point. but in this case the 50th percentile from quantile is generated from a random normal distribution that is centered on the point estimate. rnorm(size=1000, ...) above so i would think that the median would be pretty close to the mean

looks like it is (using objects created from example code above):

pred$fit
[1] 4.17622
stephenturner commented 3 years ago

This section of the FPP3 book is a good primer on forecast distributions, and later on shows how to use bootstrapping to get prediction intervals. https://otexts.com/fpp3/prediction-intervals.html

Also, from: https://otexts.com/fpp3/forecasting-using-transformations.html

If a transformation has been used, then the prediction interval is first computed on the transformed scale, then the end points are back-transformed to give a prediction interval on the original scale.

stephenturner commented 3 years ago

This is demonstrated around https://github.com/signaturescience/focustools/commit/f4a58da43e7876f289cc79229666fe7dc946cef3#diff-da4c2dfe4ffa338da3407b8db34eac89ce01bcfc58032a6688085c6452350d0fR108-R115

vpnagraj commented 3 years ago

we have this working (see https://github.com/signaturescience/focustools/blob/master/scratch/fable-submission-mockup-allmetrics.R#L28-L38)

the quantiles meet the valid entry file format once bound, formatted with target names, etc.

closing for now, though we may need to revisit if/when we implement another kind of model outside of the fable framework