tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Possible bug: incorrect degrees of freedom`accuracy()` when computing the MAE and RMSE on the training datasret? #355

Closed gcoder1990 closed 2 years ago

gcoder1990 commented 2 years ago

I am working on the following example from the fpp3 book, focused on the recent production of beer in Australia. This is the code I use to split the data in train and test and to train the model:

recent_production <- aus_production %>%
  filter(year(Quarter) >= 1992)

beer_train <- recent_production %>%
  filter(year(Quarter) <= 2007)

beer_fit <- beer_train %>%
  model(
    Drift = RW(Beer ~ drift())
  )

I then compute the accuracy metrics on the training dataset (the residual errors) as follows:

beer_fit %>%
    accuracy()

This yields a MAE of 54.76795 for the Drift Model. I then proceed to perform the same computation manually, to understand how the metric is computed

resid <- beer_fit %>% 
         augment() %>% 
         pull(.innov) %>%
         na.omit() # Key: remove first NA!

# Compute the MAE
T <- length(resid)
K <- nrow(tidy(beer_fit))

MAE = (1/(T-K))*sum(abs(resid))

This results in a MAE of 55.65.

By increasing the degrees of freedom by 1, I get the right computation. That is, the first NA value that I removed was considered when counting the degrees of freedom. Is this correct @mitchelloharawild ?

If so... could you briefly elaborate on why? When computing the variance of the residuals, NAs are removed. I understand this should be the same when averaging the residuals to compute these error metrics.

gcoder1990 commented 2 years ago

I think the problem has maybe to do with the K.

Is it not necessary to correct the degrees of freedom restricted on the residuals by the model parameters when computing the mean in order to obtain the MAE?

gcoder1990 commented 2 years ago

Ok, I close the topic. I believe the issue is that, when computing errors, we are trying simply to obtain averages, not to obtain unbiased estimators. That is why we do not need to correct the degrees of freedom of by the number of parameters of our model.

robjhyndman commented 2 years ago

Yes, you're right. The MAE is not meant to be an unbiased estimate of anything, so we don't correct the degrees of freedom.