tidyverts / fable

Tidy time series forecasting
https://fable.tidyverts.org
GNU General Public License v3.0
559 stars 65 forks source link

Feature Request: Automatic K optimization for Fourier Terms #207

Open AshwinPuri13 opened 4 years ago

AshwinPuri13 commented 4 years ago

If I wish to fit a regression with Fourier terms then to find the optimal K I need to do something like this:

library(fable)
library(dplyr)
library(tidyr)

mbl = tsibbledata::ansett %>%
  tsibble::fill_gaps() %>%
  model(arima1 = ARIMA(Passengers ~ fourier(K = 1) + PDQ(0,0,0)),
        arima2 = ARIMA(Passengers ~ fourier(K = 2) + PDQ(0,0,0)),
        arima3 = ARIMA(Passengers ~ fourier(K = 3) + PDQ(0,0,0)))

metrics = mbl %>%
  glance()

mbl_best = metrics %>%
  select(Airports, Class, .model, AICc) %>%
  group_by(Airports, Class) %>%
  slice(which.min(AICc)) %>%
  left_join(mbl %>%
              gather('.model', 'model', -Airports, -Class),
            by = c('.model', 'Airports', 'Class')) %>%
  as_mable(key = c('Airports', 'Class'), models = 'model')

It would be more convenient for K to be automatically determined through something like this:

model(arima = ARIMA(Passengers ~ Fourier(K = 1:3) + PDQ(0,0,0)

On that note, when I look at the source code for ARIMA it appears that when fitting a regression + ARIMA the number of differences is determined after the regression. Because of this, it seems entirely possible that the arima1, arima2 and arima3 models I fit could potentially have a different number of differencing. If this is indeed the case perhaps determining K through cross validation is better?

Thanks!

robjhyndman commented 4 years ago

Automating the choice of K could be a feature we look at in a future release. It is very unlikely to affect the order of differencing, so I think using AICc for selection is safe enough.

mitchelloharawild commented 4 years ago

This is something which will need to be added on a model by model basis, as each model will have different methods of model selection.

JaySumners commented 3 years ago

Could we iteratively select the best K based on the whatever criteria is used in the base model? My idea is to fit fourier series of different K linearly to the response and select the one with the best criteria measure as passed by the base model. Is there a case where we wouldn't want to fit it linearly? I'll admit that re-estimation after fitting the rest of the model would be good, but that this might provided directionality for the user that doesn't know which K to select.

juan-g-p commented 10 months ago

As in interim solution, I am trying to fit multiple moders in a loop manner so that I do not have to repeat the formula so many times.

Yet I am struggling (I do not have that much of a background in tidy R).

Could you help?