Mean Absolute Scaled Error (MASE)

DavisVaughan commented 5 years ago

Definition: https://en.wikipedia.org/wiki/Mean_absolute_scaled_error

I would use mape() as a general guide.
There is a seasonal and non-seasonal version. The non-seasonal version is just the seasonal version with m = 1 so we can just implement 1 version that handles seasonality and defaults to the non-seasonal version.
To see how extra arguments can be passed through, use ccc() as a guide. Specifically, you need to pass it through in the metric_summarizer() and in the metric_vec_template()`.
[ ] Call it mase()
[ ] Because of the seasonality, it will need a parameter to control that. I would call it period = 1 and document clearly that this controls the seasonal period in the calculation. (this is m in the wiki documentation)
[ ] Pay close attention to how we generate documentation and examples automatically
[ ] Try and understand how metric_summarizer() works, along with the rationale behind metric_vec_template(), then use them in the implementation.
[ ] Add a few tests. Small examples you can match from academic papers are best, but otherwise any online example is okay. If all else fails, create an example "by hand" that is easy to manually compute the result for.
[ ] Use the file naming scheme num-<metric>.R
[ ] I would use add a documentation reference to the paper by Hyndman, since it is a metric he helped develop http://datascienceassn.org/sites/default/files/Another%20Look%20at%20Measures%20of%20Forecast%20Accuracy.pdf
[ ] In the docs, mention that this is a metric generally used in time series forecasting.

The Custom Metrics vignette will probably be helpful.

ClaytonJY commented 5 years ago

claimed!

alexhallam commented 5 years ago

If it useful for testing here is the data from Hyndman's paper.

library(tidyverse)
sales <- tibble(g = c(rep("insample", 23), rep("outsample", 12)),
       y = c(0,2,0,1,0,10,0,0,0,2,0,6,3,0,0,0,0,0,7,0,0,0,0, 0, 0, 0, 3, 1, 0, 0, 1, 0, 1, 0, 0),
       y_hat = ifelse(g == "insample", lag(y), last(y)),
       error = abs(y - y_hat))

sales

ClaytonJY commented 5 years ago

@alexhallam thank you! I've been so busy chatting today I haven't made much progress implementing this; I might get to it once I'm back home but if you're geeked up about this right now, please feel free to take it on!

alexhallam commented 5 years ago

@ClaytonJY This is a first wack at the mase function along with a test. It needs some cleaning. Let me know what you think. Feel free to take over the problem from here.

suppressMessages(library(tidyverse))
suppressMessages(library(yardstick))
suppressMessages(library(tibble))

mase_impl <- function(truth, estimate, is_insample, m = 1) {

    mase_df <- cbind(truth, estimate, is_insample) %>% tibble::as_tibble()
    mae = mase_df %>% filter(is_insample == FALSE) %>% yardstick::mae(truth, estimate) %>% pull(".estimate")
    naive_mae = mase_df %>% filter(is_insample == TRUE) %>% yardstick::mae(truth, lag(truth, m)) %>% pull(".estimate")
    mase = mae / naive_mae
    mase
}

# Use Hyndman's example
# data
sales <- tibble(g = c(rep("insample", 23), rep("outsample", 12)),
       y = c(0,2,0,1,0,10,0,0,0,2,0,6,3,0,0,0,0,0,7,0,0,0,0, 0, 0, 0, 3, 1, 0, 0, 1, 0, 1, 0, 0),
       y_hat = ifelse(g == "insample", lag(y), last(y)),
       error = abs(y - y_hat)) %>%
  ungroup() %>%
  rowwise() %>%
  mutate(is_insample = ifelse(g == "insample", 1, 0))

# test
isTRUE(round(mase_impl(sales$y, sales$y_hat, sales$is_insample), 2) == 0.2)
#> [1] TRUE

^{Created on 2019-01-22 by the reprex package (v0.2.1)}

alexhallam commented 5 years ago

I was hoping I could get some more experienced eyes on this.

This is a link to my solution to the mase error metric, I am having difficulty passing a vector as an additional argument to the function.

https://github.com/alexhallam/yardstick/blob/mase_error/R/num-mase.R

To test the function I have been using the following data

  sales <- data.frame(list(y = c(0, 2, 0, 1, 0, 10, 0, 0, 0, 2, 0, 6, 3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 3, 1, 0, 0, 1, 0, 1, 0, 0),
       y_hat = c(NA, 0, 2, 0, 1, 0, 10, 0, 0, 0, 2, 0, 6, 3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
       insample = c(T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,T, T, T, T, T, T, T, F, F, F, F, F, F, F, F, F, F, F, F)))

with the above data the function call is

mase(sales, truth = y, estimate = y_hat, is_insample = insample , m = 1)

github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

tidymodels / yardstick

Mean Absolute Scaled Error (MASE) #68