Add check_residuals() function

robjhyndman commented 5 years ago

Essentially a wrapper to

  augment(model) %>%
  features(.resid, ljung_box, lag=<10 or 2*period>, dof=<from model>)

robjhyndman commented 5 years ago

Or maybe this should be called test_residuals().

mitchelloharawild commented 5 years ago

Certainly possible, this would also require models to add the dof to the glance output or similar (much like forecast:::modeldf()). period can be determined from the tsibble.

I think the interface needs more thought to ensure that a consistent and general interface is preserved throughout the package.

Checking/testing the residuals would often involve more than just the Ljung-Box test. Should there be a tag which is used for model testing?
How would you specify the tests that they are interested in?
What other parameters from the model fit may be useful when computing a feature?
Are there other functions which should wrap features()? Should this be common practice?

robjhyndman commented 5 years ago

Adding dof to the glance output seems like a good idea in any case. Yes, making it more general might be a good idea, although the use in the textbook is almost always LB apart from regression models where LB is controversial and Breusch-Godfrey is sometimes preferred.

mbg-unsw commented 3 years ago

This would be great for mables where each model has a different dof.

mitchelloharawild commented 3 years ago

I've now added (experimentally) hypothesize() methods in fabletools (0f3c42f6c6e1aa837de3ca5385447387bbdc1f48) for running statistical tests on fitted models. It is very similar to features(), but more oriented to computing tests on fitted models, rather than features on data. Note that tests can be features, but not the other way round. Note that hypothesise() will be available once https://github.com/r-lib/generics/issues/55 is resolved. As an example of how this function works, I have also added breusch_godfrey() in https://github.com/tidyverts/fable/commit/0f3c42f6c6e1aa837de3ca5385447387bbdc1f48 which can be used as follows:

library(fpp3)
tourism %>% 
  model(TSLM(Trips ~ trend() + season())) %>% 
  hypothesize(tests = lst(breusch_godfrey), order = 24)
#> # A tibble: 304 x 9
#>    Region   State    Purpose .model     .test  statistic order null_dist p.value
#>    <chr>    <chr>    <chr>   <chr>      <chr>      <dbl> <int>    <dist>   <dbl>
#>  1 Adelaide South A… Busine… TSLM(Trip… breus…      23.3    24    ᵪ²(24)  0.500 
#>  2 Adelaide South A… Holiday TSLM(Trip… breus…      26.1    24    ᵪ²(24)  0.346 
#>  3 Adelaide South A… Other   TSLM(Trip… breus…      34.7    24    ᵪ²(24)  0.0732
#>  4 Adelaide South A… Visiti… TSLM(Trip… breus…      29.7    24    ᵪ²(24)  0.194 
#>  5 Adelaid… South A… Busine… TSLM(Trip… breus…      24.8    24    ᵪ²(24)  0.414 
#>  6 Adelaid… South A… Holiday TSLM(Trip… breus…      24.1    24    ᵪ²(24)  0.458 
#>  7 Adelaid… South A… Other   TSLM(Trip… breus…      25.5    24    ᵪ²(24)  0.377 
#>  8 Adelaid… South A… Visiti… TSLM(Trip… breus…      12.1    24    ᵪ²(24)  0.979 
#>  9 Alice S… Norther… Busine… TSLM(Trip… breus…      26.5    24    ᵪ²(24)  0.327 
#> 10 Alice S… Norther… Holiday TSLM(Trip… breus…      30.7    24    ᵪ²(24)  0.163 
#> # … with 294 more rows

^{Created on 2021-04-08 by the reprex package (v1.0.0)}

I do think that it should be easy to compute both Ljung-Box and Breusch-Godfrey tests on regression models, and at most it should hint toward Breusch-Godfrey for regression models in the documentation.

mbg-unsw commented 3 years ago

Thanks, that looks great.

I assume we'll also need new Ljung-Box and Breusch-Godfrey methods for ARIMA that can pick up the dof from each model?

mitchelloharawild commented 3 years ago

Ljung-Box and Box-Pierce tests will be written to work with any model that makes the degrees of freedom available. This will be the next one to add, however it will require some migration of feasts::ljung_box() to fabletools::ljung_box().

baumstan commented 3 years ago

@mitchelloharawild I've updated my fabletools package and am unable to run the breusch_godfrey on my TSLM.

remotes::install_github("tidyverts/fabletools")

fit_trend <- q1_ts %>% mutate(surfing_festival = ifelse(month(month)==3 & year(month) > 1987,1,0)) %>% model(exponential = TSLM(log(sales)~ trend() + season() + surfing_festival)) report(fit_trend)

I've tried: fit_trend %>% hypothesise(tests = lst(breusch_godfrey), order = 24) Error in hypothesise(., tests = lst(breusch_godfrey), order = 24) : could not find function "hypothesise"

and: fable::breusch_godfrey(fit_trend) Error in UseMethod("breusch_godfrey") : no applicable method for 'breusch_godfrey' applied to an object of class "c('mdl_df', 'tbl_df', 'tbl', 'data.frame')"

Any guidance would be appreciated.

mitchelloharawild commented 3 years ago

Looks like you haven't yet loaded the development version. Try restarting R to unload the CRAN version of fabletools so that next time you load the fabletools package, you will have the dev version and access to these new functions.

baumstan commented 3 years ago

Thank you. I'd loaded but not restarted. This code works:

fit_trend %>%
  hypothesize(tests = lst(breusch_godfrey), order = 1)

But this one doesn't...

fable::breusch_godfrey(fit_trend, order =1)

Could you confirm that I've correctly used the hypothesize option given that my model is a regression not an ARIMA?

mitchelloharawild commented 3 years ago

Yes, the first code snippet is the current interface for running the test.

mitchelloharawild commented 3 years ago

An alternative generic function is needed for computing values from distributions, such as Newey-West (https://github.com/tidyverts/fable/issues/332). The function could/would act very similarly to what we have described here.

tidyverts / fabletools

Add check_residuals() function #105