Issue manually validating TSLM/trend prediction intervals

USMortality commented 1 year ago

When manually calculating the 95%PI with the formulas described here: https://otexts.com/fpp3/prediction-intervals.html

For the TSLM/trend method - I am seeing a slight discrepancy between my excel calculations and the R fable package.

Here's a minimal reproducible example:

data.frame(
  year = seq(1, 5),
  asmr = c(882, 876, 879, 871, 867)
) |>
  as_tsibble(index = year) |>
  model(TSLM(asmr ~ trend())) |>
  forecast(h = 5) |>
  hilo(level = 95)

Output:

# A tsibble: 5 x 5 [1]
# Key:       .model [1]
  .model                year       asmr .mean                  `95%`
  <chr>                <dbl>     <dist> <dbl>                 <hilo>
1 TSLM(asmr ~ trend())     6 N(864, 16)  864. [856.5507, 872.4493]95
2 TSLM(asmr ~ trend())     7 N(861, 22)  861  [851.8209, 870.1791]95
3 TSLM(asmr ~ trend())     8 N(858, 29)  858. [846.9483, 868.0517]95
4 TSLM(asmr ~ trend())     9 N(854, 38)  854  [841.9817, 866.0183]95
5 TSLM(asmr ~ trend())    10 N(850, 48)  850. [836.9517, 864.0483]95

Screenshot 2023-09-30 at 6 57 08 PM Sheet: https://docs.google.com/spreadsheets/d/1hbzpwZH8ctNWZ0PSQIZ8bUBlbJLC3DhQGne8UImOZEQ/edit#gid=409932088

Note, that I have similarly verified the MEAN and NAIVE methods without problems. Both the residuals and o value match, but the resulting Prediction Intervals are slightly different.

I would appreciate it if anyone could please verify this. Thanks

robjhyndman commented 1 year ago

Most likely you are rounding before calculating. Here is the calculation using R:

library(fpp3)

fc <- data.frame(
  year = seq(1, 5),
  asmr = c(882, 876, 879, 871, 867)
) |>
  as_tsibble(index = year) |>
  model(TSLM(asmr ~ trend())) |>
  forecast(h = 5) 

fc |> 
  mutate(
    .var = distributional::variance(asmr),
    .lo = .mean - 1.96*sqrt(.var),
    .hi = .mean + 1.96*sqrt(.var)
  ) |> 
  hilo(level = 95)
#> # A tsibble: 5 x 8 [1]
#> # Key:       .model [1]
#>   .model          year       asmr .mean  .var   .lo   .hi                  `95%`
#>   <chr>          <dbl>     <dist> <dbl> <dbl> <dbl> <dbl>                 <hilo>
#> 1 TSLM(asmr ~ t…     6 N(864, 16)  864.  16.4  857.  872. [856.5507, 872.4493]95
#> 2 TSLM(asmr ~ t…     7 N(861, 22)  861   21.9  852.  870. [851.8209, 870.1791]95
#> 3 TSLM(asmr ~ t…     8 N(858, 29)  858.  29.0  847.  868. [846.9483, 868.0517]95
#> 4 TSLM(asmr ~ t…     9 N(854, 38)  854   37.6  842.  866. [841.9817, 866.0183]95
#> 5 TSLM(asmr ~ t…    10 N(850, 48)  850.  47.8  837.  864. [836.9517, 864.0483]95

^{Created on 2023-10-07 with reprex v2.0.2}

USMortality commented 1 year ago

Thanks for your reply. Your code is fine - however, the discrepancy seems to be in the variance/.var value.

Here's the R code that calculates the variance/.var manually, and compares it to the fabletools calculations. Only the MEAN approach matches exactly. I think it all comes down to the formulas used in the .var = ... step, which I took from table 5.2 (chapter 5.5) of your book. What might I be missing? Thanks!

library(fpp3)

# Data
ts <- data.frame(
    year = seq(1, 5),
    asmr = c(882, 876, 879, 871, 867)
) |> as_tsibble(index = year)
T_len <- nrow(ts)

# MEAN
model <- ts |>
    mutate(.fitted = mean(asmr)) |>
    mutate(.resid = asmr - .fitted)
o <- sd(model$.resid)
fc <-
    data.frame(year = seq(6, 10)) |>
    mutate(
        .fitted = model$.fitted[1],
        h = year - T_len
    ) |>
    mutate(.var = (o * sqrt(1 + 1 / T_len))^2)

fc_fabletools <- ts |>
    model(MEAN(asmr)) |>
    forecast(h = 5) |>
    mutate(.var = distributional::variance(asmr))

identical(fc$.var, fc_fabletools$.var) # TRUE

# NAIVE
model <- ts |>
    mutate(.fitted = lag(asmr, 1)) |>
    mutate(.resid = asmr - .fitted)
o <- sd(model$.resid, na.rm = TRUE)
fc <-
    data.frame(year = seq(6, 10)) |>
    mutate(
        .fitted = model$asmr[T_len],
        h = year - T_len
    ) |>
    mutate(.var = (o * sqrt(h))^2)

fc_fabletools <- ts |>
    model(MEAN(asmr)) |>
    forecast(h = 5) |>
    mutate(.var = distributional::variance(asmr))

identical(fc$.var, fc_fabletools$.var) # FALSE

# TSLM/trend
fit <- lm(asmr ~ year, data = ts)
model <- ts |>
    mutate(.fitted = fit$coefficients[2] * year + fit$coefficients[1]) |>
    mutate(.resid = asmr - .fitted)
o <- sd(model$.resid)
fc <-
    data.frame(year = seq(6, 10)) |>
    mutate(
        .fitted = fit$coefficients[2] * year + fit$coefficients[1],
        h = year - T_len
    ) |>
    mutate(.var = (o * sqrt(h * (1 + h / (T_len - 1))))^2)

fc_fabletools <- ts |>
    model(TSLM(asmr ~ trend())) |>
    forecast(h = 5) |>
    mutate(.var = distributional::variance(asmr))

identical(fc$.var, fc_fabletools$.var) # FALSE

robjhyndman commented 1 year ago

There are several issues here.

When comparing real values, it is better to use all.equal() rather than identical() as it allows for floating point error.
In your NAIVE code, you fit a MEAN model.
In your TSLM code, you compare against the drift variance, which is not the same thing.
The variance of residuals should be estimated using mean(model$.resid^2, na.rm=TRUE) because we know the mean is zero (assuming unbiasedness), so there is no need to estimate it. This is correctly implemented for NAIVE, but not for MEAN. I'll get it fixed.
If you use o <- sqrt(mean(model$.resid^2, na.rm=TRUE)) in the NAIVE example, fit a NAIVE model, and replace identical() with all.equal(), the results match.

USMortality commented 1 year ago

Ah ok, that makes sense. Thanks for pointing out the errors.

So, if I understand your reply correctly, fable::MEAN implementation should also be using the mean of the squared residuals.

Could it be, that this is also the case for the fable::TSLM function - because even omitting the + trend() results in a variance of 43.8, identical to the current MEAN variance.

model <- ts |>
    mutate(.fitted = mean(asmr)) |>
    mutate(.resid = asmr - .fitted)
o <- sqrt(mean(model$.resid^2, na.rm = TRUE))
mean1 <-
    data.frame(year = seq(6, 10)) |>
    mutate(
        .fitted = model$.fitted[1],
        h = year - T_len
    ) |>
    mutate(.var = (o * sqrt(1 + 1 / T_len))^2)
mean2 <- fc_fabletools <- ts |>
    model(MEAN(asmr)) |>
    forecast(h = 5) |>
    mutate(.var = distributional::variance(asmr))
tslm <- fc_fabletools <- ts |>
    model(TSLM(asmr)) |>
    forecast(h = 5) |>
    mutate(.var = distributional::variance(asmr))

Shouldn't these three all produce identical results?

robjhyndman commented 1 year ago

Ah, sorry. My mistake. We do need to account for the number of parameters estimated when estimating the residual variance. So my point 4 above is only true for models with no parameters, such as NAIVE. It is not true for the MEAN model which has one parameter.

For the MEAN model, because there is one parameter, the residual variance should be sum(res^2)/(T-1). Because the sample mean of the residuals is identical to zero, this turns out to be equal to var(res), and the fable code is correct.

For the NAIVE model, there are no parameters, so the residual variance should be sum(res^2)/T, as I calculated above.

For the TSLM model, where just a constant is fitted, it is identical to the MEAN model, and the results should be the same. In your last block of code, if you revert to o <- sd(model$.resid), the three models should be the same.

If you do go ahead and check the results for the drift method, that also has one parameter, so the residual variance should be sum(res^2)/(T-1).

In general, the residual variance should be sum(res^2)/(T-k) for models with k estimated parameters. That's true even if the sample mean of the residuals is different from zero, because the population mean is zero by the unbiasedness assumption.

USMortality commented 1 year ago

Great, that makes a lot of sense and is inline with what I've read from your book/website! Still doesn't match all, but I think we're almost there, please take a look!

library(fpp3)

# Data
ts <- data.frame(
    year = seq(1, 5),
    asmr = c(882, 876, 879, 871, 867)
) |> as_tsibble(index = year)
T_len <- nrow(ts)

# A) MEAN
model <- ts |>
    mutate(.fitted = mean(asmr)) |>
    mutate(.resid = asmr - .fitted)
k <- 1
o <- sqrt(sum(model$.resid^2, na.rm = TRUE) / (T_len - k))
fc <-
    data.frame(year = seq(6, 10)) |>
    mutate(
        .fitted = model$.fitted[1],
        h = year - T_len
    ) |>
    mutate(.var = (o * sqrt(1 + 1 / T_len))^2)

fc_fabletools <- ts |>
    model(MEAN(asmr)) |>
    forecast(h = 5) |>
    mutate(.var = distributional::variance(asmr))

all.equal(fc$.var, fc_fabletools$.var) # TRUE

# B) NAIVE
model <- ts |>
    mutate(.fitted = lag(asmr, 1)) |>
    mutate(.resid = asmr - .fitted)
k <- 1 # Shouldn't this be zero?
o <- sqrt(sum(model$.resid^2, na.rm = TRUE) / (T_len - k))
fc <-
    data.frame(year = seq(6, 10)) |>
    mutate(
        .fitted = model$asmr[T_len],
        h = year - T_len
    ) |>
    mutate(.var = (o * sqrt(h))^2)
fc_fabletools <- ts |>
    model(NAIVE(asmr)) |>
    forecast(h = 5) |>
    mutate(.var = distributional::variance(asmr))
all.equal(fc$.var, fc_fabletools$.var) # TRUE

# C) TSLM
model <- ts |>
    mutate(.fitted = mean(asmr)) |>
    mutate(.resid = asmr - .fitted)
k <- 1
o <- sqrt(sum(model$.resid^2, na.rm = TRUE) / (T_len - k))
fc <-
    data.frame(year = seq(6, 10)) |>
    mutate(
        .fitted = model$.fitted[1],
        h = year - T_len
    ) |>
    mutate(.var = (o * sqrt(1 + 1 / T_len))^2)
fc_fabletools <- ts |>
    model(TSLM(asmr)) |>
    forecast(h = 5) |>
    mutate(.var = distributional::variance(asmr))
all.equal(fc$.var, fc_fabletools$.var) # TRUE

# D) TSLM/trend
fit <- lm(asmr ~ year, data = ts)
model <- ts |>
    mutate(.fitted = fit$coefficients[2] * year + fit$coefficients[1]) |>
    mutate(.resid = asmr - .fitted)
k <- 1
o <- sqrt(sum(model$.resid^2, na.rm = TRUE) / (T_len - k))
fc <-
    data.frame(year = seq(6, 10)) |>
    mutate(
        .fitted = fit$coefficients[2] * year + fit$coefficients[1],
        h = year - T_len
    ) |>
    mutate(.var = (o * sqrt(h * (1 + h / (T_len - 1))))^2)
fc_fabletools <- ts |>
    model(TSLM(asmr ~ trend())) |>
    forecast(h = 5) |>
    mutate(.var = distributional::variance(asmr))
all.equal(fc$.var, fc_fabletools$.var) # FALSE

Four cases: A) MEAN PASS B) NAIVE PASS - but only with k=1, shouldn't it be 0, as you wrote above? C) TSLM PASS D) TSLM/trend FAIL - tested k=1 and 0, both fail... not sure why?

I think the issue might be here: mutate(.var = (o * sqrt(h * (1 + h / (T_len - 1))))^2) - the implementation of the 'h-step forecast standard deviation' - it should match table 5.2 in your book?

robjhyndman commented 1 year ago

B) T is the number of non-missing values. With NAIVE, there is one non-missing value due to the first observation not having an available lag.

D) You are using the variance for a drift model, which is not the same as the variance of a linear time trend. The expression for a trend variance is given at the bottom of https://otexts.com/fpp3/regression-exercises.html

USMortality commented 1 year ago

Ok, thanks. Makes sense.

I've looked at the formula, and it seems that D) is not as straight forward to manually verify then...

Feel free to close this issue. Thanks again!

robjhyndman / fpp3

Issue manually validating TSLM/trend prediction intervals #12