tidyverts / fasster

Forecasting with Additive Switching of Seasonality, Trend and Exogenous Regressors
https://fasster.tidyverts.org/
150 stars 15 forks source link

Confidence intervals for log() transformed time series #38

Closed datawookie closed 6 years ago

datawookie commented 6 years ago

Hi!

I'm having some trouble figuring out how to formulate confidence intervals for a model where I am applying a log() transform to the outcome. The forecast plot looks perfectly reasonable but I need to get my hands on the numerical values of the confidence intervals.

This is what the model looks like:

fit <- data %>%
  fasster(
    log(y) ~ poly(1) + seas(24)
  )

I'm extracting data for the forecast as follows:

fit %>% 
  forecast(h=24) %>%
  pull(forecast) %>%
  .[[1]]

and the output looks like this:

   time                  mean    se distribution       
   <dttm>               <dbl> <dbl> <dist>             
 1 2018-09-25 08:00:00  887.   1.31 t(N(6.2, sd = 1.3))
 2 2018-09-25 09:00:00  339.   1.34 t(N(5.2, sd = 1.3))
 3 2018-09-25 10:00:00  399.   1.38 t(N(5.3, sd = 1.4))
 4 2018-09-25 11:00:00  367.   1.43 t(N(5.2, sd = 1.4))
 5 2018-09-25 12:00:00   40.4  1.47 t(N(3, sd = 1.5))  
 6 2018-09-25 13:00:00  103.   1.51 t(N(3.9, sd = 1.5))
 7 2018-09-25 14:00:00  580.   1.55 t(N(5.6, sd = 1.5))
 8 2018-09-25 15:00:00  858.   1.59 t(N(5.9, sd = 1.6))
 9 2018-09-25 16:00:00 2981.   1.62 t(N(7.2, sd = 1.6))
10 2018-09-25 17:00:00 3459.   1.66 t(N(7.3, sd = 1.7))

Am I right in assuming that the 95% confidence interval for the first forecast point would be

qlnorm(c(0.025, 0.975), 6.2, 1.3)

If there's a function for extracting the confidence intervals that would be great. However, I have no problem extracting them by hand. Just want to know that I'm doing the right thing!

Thanks, Andrew.

mitchelloharawild commented 6 years ago

Hi Andrew,

We're still thinking of an appropriate verb for extracting forecasts from a fable, and so functionality for this is limited.

Two main methods currently exist for extracting forecasts:

fortify(level = 95): Converts the forecasts into a long form suitable for plotting (function name is fixed, but other more memorable/informative verbs will be added) summary(level = 95): Converts the forecasts into a familiar form, similar to the forecast package (function name will change in the future as summary should be a print only method)

fable differs from forecast because the forecast uncertainty is stored as distributions rather than intervals. One thing that this allows us to do is modify the level of the intervals without re-estimating the forecasts, although more verbs are needed to make this process easier.

Any thoughts or suggestions is welcomed :smile:

mitchelloharawild commented 6 years ago

An example of the format each function returns:

library(fable)
#> Loading required package: fablelite
fc <- USAccDeaths %>% ETS(log(value)) %>% forecast
ggplot2::fortify(fc, level = 95)
#> # A tsibble: 24 x 6 [1M]
#> # Key:       level [1]
#>       index   mean     se level lower  upper
#>       <mth>  <dbl>  <dbl> <dbl> <dbl>  <dbl>
#>  1 1979 Jan  8297. 0.0326    95 7779.  8840.
#>  2 1979 Feb  7509. 0.0366    95 6985.  8061.
#>  3 1979 Mar  8319. 0.0409    95 7672.  9006.
#>  4 1979 Apr  8513. 0.0446    95 7793.  9281.
#>  5 1979 May  9381. 0.0484    95 8522. 10303.
#>  6 1979 Jun  9882. 0.0518    95 8915. 10924.
#>  7 1979 Jul 10750. 0.0553    95 9631. 11962.
#>  8 1979 Aug 10083. 0.0578    95 8989. 11273.
#>  9 1979 Sep  9020. 0.0597    95 8009. 10121.
#> 10 1979 Oct  9363. 0.0625    95 8267. 10563.
#> # ... with 14 more rows
summary(fc, level = 95)
#> # A tsibble: 24 x 4 [1M]
#>       index   mean     se                   `95%`
#>       <mth>  <dbl>  <dbl>                  <hilo>
#>  1 1979 Jan  8297. 0.0326 [7778.889,  8839.673]95
#>  2 1979 Feb  7509. 0.0366 [6984.585,  8061.472]95
#>  3 1979 Mar  8319. 0.0409 [7671.749,  9006.239]95
#>  4 1979 Apr  8513. 0.0446 [7792.557,  9281.075]95
#>  5 1979 May  9381. 0.0484 [8521.509, 10302.613]95
#>  6 1979 Jun  9882. 0.0518 [8914.810, 10923.716]95
#>  7 1979 Jul 10750. 0.0553 [9630.596, 11962.361]95
#>  8 1979 Aug 10083. 0.0578 [8989.110, 11272.829]95
#>  9 1979 Sep  9020. 0.0597 [8009.118, 10121.258]95
#> 10 1979 Oct  9363. 0.0625 [8267.183, 10562.612]95
#> # ... with 14 more rows

Created on 2018-09-25 by the reprex package (v0.2.0).

datawookie commented 6 years ago

Great! Thanks, Mitchell. The ggplot2::fortify() route will work well for my application. I think that storing the uncertainties as distributions makes a lot of sense. Perhaps add this somewhere accessible in the README?

mitchelloharawild commented 6 years ago

Great, glad that this functionality is suitable. Once we've got a suitable set of generic names I'll add this to the README. The package is also almost at the stage where a complete workflow can be made, which I will use to build a vignette that goes through the key functionality of the package.