tidyverts / feasts

Feature Extraction And Statistics for Time Series
https://feasts.tidyverts.org/
291 stars 23 forks source link

add dynamic dof selection for ljung_box feature for both single and multiple models #143

Open ghost opened 3 years ago

ghost commented 3 years ago

Unless I am mistaken, it seems like the ljung_box feature requires manual specification for the dof and lag arguments outside of the defaults, which are 0 and 1 respectively. This can be an issue when your mable contains models which have varying parameter counts, in which case dof should be different for the respective models. In that case, you'd want the ljung_box feature to calculate the statistic and p-value based on each model. Ex.)

# subset data for training
train <- aus_production %>%
  filter_index("1992 Q1" ~ "2006 Q4")

# Create models 
beer_fit <- train %>%
  model(
    Mean = MEAN(Beer),
    `Seasonal naïve` = SNAIVE(Beer)
  )

# check how many estimated parameters each model has, if any. Only `Mean` will show
# as having at least 1 parameter
beer_fit %>%
  tidy() %>%
  group_by(.model) %>%
  count()

# get ljung box information
beer_fit %>%
  augment()  %>%
  features(.innov,ljung_box)

Note that the last command in the code will produce ljung_box information but with both having a dof value of 0 when Mean should be 1 and Seasonal naïve should be 0.

I believe this can be fixed using a relatively simple mapply() function. (This could obviously be improved on, is just a rough draft) as follows:

ljung_box_mult <- function(dat,lag = 10){

  input <- dat %>%
    augment() %>%
    as_tibble() %>%
    select(.model) %>%
    unique(by = ".model") %>%
    left_join(dat %>%
                tidy() %>%
                group_by(.model) %>%
                count()) %>%
    mutate(n = if_else(is.na(n),0L,n))

  output <- mapply(function(x,y){

    dat %>%
      select(x) %>%
      augment() %>%
      features(.innov,ljung_box,lag=lag,dof = y)

  },input$.model,input$n,SIMPLIFY = FALSE)

  return(do.call(rbind,output))

}

beer_fit %>%
  lung_box_mult()

If I am a bonehead and there is a way to do this already please let me know. If not, then I am open to suggestions on how this can be implemented/improved upon.