tidymodels / yardstick

Tidy methods for measuring model performance
https://yardstick.tidymodels.org/
Other
368 stars 54 forks source link

Mean Absolute Scaled Error (MASE) #68

Closed DavisVaughan closed 5 years ago

DavisVaughan commented 5 years ago

Definition: https://en.wikipedia.org/wiki/Mean_absolute_scaled_error

See also the Scale Free Errors section of: http://datascienceassn.org/sites/default/files/Another%20Look%20at%20Measures%20of%20Forecast%20Accuracy.pdf

The Custom Metrics vignette will probably be helpful.

ClaytonJY commented 5 years ago

claimed!

alexhallam commented 5 years ago

If it useful for testing here is the data from Hyndman's paper.

library(tidyverse)
sales <- tibble(g = c(rep("insample", 23), rep("outsample", 12)),
       y = c(0,2,0,1,0,10,0,0,0,2,0,6,3,0,0,0,0,0,7,0,0,0,0, 0, 0, 0, 3, 1, 0, 0, 1, 0, 1, 0, 0),
       y_hat = ifelse(g == "insample", lag(y), last(y)),
       error = abs(y - y_hat))

sales
table plot
ClaytonJY commented 5 years ago

@alexhallam thank you! I've been so busy chatting today I haven't made much progress implementing this; I might get to it once I'm back home but if you're geeked up about this right now, please feel free to take it on!

alexhallam commented 5 years ago

@ClaytonJY This is a first wack at the mase function along with a test. It needs some cleaning. Let me know what you think. Feel free to take over the problem from here.

suppressMessages(library(tidyverse))
suppressMessages(library(yardstick))
suppressMessages(library(tibble))

mase_impl <- function(truth, estimate, is_insample, m = 1) {

    mase_df <- cbind(truth, estimate, is_insample) %>% tibble::as_tibble()
    mae = mase_df %>% filter(is_insample == FALSE) %>% yardstick::mae(truth, estimate) %>% pull(".estimate")
    naive_mae = mase_df %>% filter(is_insample == TRUE) %>% yardstick::mae(truth, lag(truth, m)) %>% pull(".estimate")
    mase = mae / naive_mae
    mase
}

# Use Hyndman's example
# data
sales <- tibble(g = c(rep("insample", 23), rep("outsample", 12)),
       y = c(0,2,0,1,0,10,0,0,0,2,0,6,3,0,0,0,0,0,7,0,0,0,0, 0, 0, 0, 3, 1, 0, 0, 1, 0, 1, 0, 0),
       y_hat = ifelse(g == "insample", lag(y), last(y)),
       error = abs(y - y_hat)) %>%
  ungroup() %>%
  rowwise() %>%
  mutate(is_insample = ifelse(g == "insample", 1, 0))

# test
isTRUE(round(mase_impl(sales$y, sales$y_hat, sales$is_insample), 2) == 0.2)
#> [1] TRUE

Created on 2019-01-22 by the reprex package (v0.2.1)

alexhallam commented 5 years ago

I was hoping I could get some more experienced eyes on this.

This is a link to my solution to the mase error metric, I am having difficulty passing a vector as an additional argument to the function.

https://github.com/alexhallam/yardstick/blob/mase_error/R/num-mase.R

To test the function I have been using the following data

  sales <- data.frame(list(y = c(0, 2, 0, 1, 0, 10, 0, 0, 0, 2, 0, 6, 3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 3, 1, 0, 0, 1, 0, 1, 0, 0),
       y_hat = c(NA, 0, 2, 0, 1, 0, 10, 0, 0, 0, 2, 0, 6, 3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
       insample = c(T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,T, T, T, T, T, T, T, F, F, F, F, F, F, F, F, F, F, F, F)))

with the above data the function call is

mase(sales, truth = y, estimate = y_hat, is_insample = insample , m = 1)
github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.