Closed DavisVaughan closed 5 years ago
claimed!
If it useful for testing here is the data from Hyndman's paper.
library(tidyverse)
sales <- tibble(g = c(rep("insample", 23), rep("outsample", 12)),
y = c(0,2,0,1,0,10,0,0,0,2,0,6,3,0,0,0,0,0,7,0,0,0,0, 0, 0, 0, 3, 1, 0, 0, 1, 0, 1, 0, 0),
y_hat = ifelse(g == "insample", lag(y), last(y)),
error = abs(y - y_hat))
sales
@alexhallam thank you! I've been so busy chatting today I haven't made much progress implementing this; I might get to it once I'm back home but if you're geeked up about this right now, please feel free to take it on!
@ClaytonJY This is a first wack at the mase function along with a test. It needs some cleaning. Let me know what you think. Feel free to take over the problem from here.
suppressMessages(library(tidyverse))
suppressMessages(library(yardstick))
suppressMessages(library(tibble))
mase_impl <- function(truth, estimate, is_insample, m = 1) {
mase_df <- cbind(truth, estimate, is_insample) %>% tibble::as_tibble()
mae = mase_df %>% filter(is_insample == FALSE) %>% yardstick::mae(truth, estimate) %>% pull(".estimate")
naive_mae = mase_df %>% filter(is_insample == TRUE) %>% yardstick::mae(truth, lag(truth, m)) %>% pull(".estimate")
mase = mae / naive_mae
mase
}
# Use Hyndman's example
# data
sales <- tibble(g = c(rep("insample", 23), rep("outsample", 12)),
y = c(0,2,0,1,0,10,0,0,0,2,0,6,3,0,0,0,0,0,7,0,0,0,0, 0, 0, 0, 3, 1, 0, 0, 1, 0, 1, 0, 0),
y_hat = ifelse(g == "insample", lag(y), last(y)),
error = abs(y - y_hat)) %>%
ungroup() %>%
rowwise() %>%
mutate(is_insample = ifelse(g == "insample", 1, 0))
# test
isTRUE(round(mase_impl(sales$y, sales$y_hat, sales$is_insample), 2) == 0.2)
#> [1] TRUE
Created on 2019-01-22 by the reprex package (v0.2.1)
I was hoping I could get some more experienced eyes on this.
This is a link to my solution to the mase error metric, I am having difficulty passing a vector as an additional argument to the function.
https://github.com/alexhallam/yardstick/blob/mase_error/R/num-mase.R
To test the function I have been using the following data
sales <- data.frame(list(y = c(0, 2, 0, 1, 0, 10, 0, 0, 0, 2, 0, 6, 3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 3, 1, 0, 0, 1, 0, 1, 0, 0),
y_hat = c(NA, 0, 2, 0, 1, 0, 10, 0, 0, 0, 2, 0, 6, 3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
insample = c(T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,T, T, T, T, T, T, T, F, F, F, F, F, F, F, F, F, F, F, F)))
with the above data the function call is
mase(sales, truth = y, estimate = y_hat, is_insample = insample , m = 1)
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Definition: https://en.wikipedia.org/wiki/Mean_absolute_scaled_error
See also the Scale Free Errors section of: http://datascienceassn.org/sites/default/files/Another%20Look%20at%20Measures%20of%20Forecast%20Accuracy.pdf
I would use
mape()
as a general guide.There is a seasonal and non-seasonal version. The non-seasonal version is just the seasonal version with
m = 1
so we can just implement 1 version that handles seasonality and defaults to the non-seasonal version.To see how extra arguments can be passed through, use
ccc()
as a guide. Specifically, you need to pass it through in themetric_summarizer()
and in the metric_vec_template()`.[ ] Call it
mase()
[ ] Because of the seasonality, it will need a parameter to control that. I would call it
period = 1
and document clearly that this controls the seasonal period in the calculation. (this ism
in the wiki documentation)[ ] Pay close attention to how we generate documentation and examples automatically
[ ] Try and understand how
metric_summarizer()
works, along with the rationale behindmetric_vec_template()
, then use them in the implementation.[ ] Add a few tests. Small examples you can match from academic papers are best, but otherwise any online example is okay. If all else fails, create an example "by hand" that is easy to manually compute the result for.
[ ] Use the file naming scheme
num-<metric>.R
[ ] I would use add a documentation reference to the paper by Hyndman, since it is a metric he helped develop http://datascienceassn.org/sites/default/files/Another%20Look%20at%20Measures%20of%20Forecast%20Accuracy.pdf
[ ] In the docs, mention that this is a metric generally used in time series forecasting.
The Custom Metrics vignette will probably be helpful.