Revised modelling API - Githubissues

earowang commented 5 years ago

Batch forecasting needs to feature (1) parallel computing, (2) error handling https://github.com/tidyverts/fable/issues/74 and (3) multiple model types/columns. Given the current framework, these will likely to be added as new arguments to ETS(), ARIMA() and etc.. In order for a more flexible framework and an easy-to-maintain package, these features would be better to separate from univariate model methods.

Univariate models only focus on a univariate tsibble, and they no longer carry out batch modelling for a multivariate tsibble. Instead, a new function batch()? handles batch modelling:

batch(.data, .models = NULL, .formulas, ..., 
  .parallel = FALSE, .safely = FALSE, .quietly = FALSE)

.data: A tsibble.
.models: univariate model functions (a function or a list of functions).
.formulas: a formula or a list of formulas.
...: arguments passed to each .model function, except for the first (data) and second (formula) arguments.

For example,

pedestrian %>%
  batch(ETS, log(Count) ~ Time)

returns the same output as current ETS(), i.e. columns Sensor and .model.

To build different model classes, do

my_model <- function(data, formula, approx = FALSE) {
  # algorithm
}
pedestrian %>%
  batch(
    .model = list(ets = ETS, arima = ARIMA, new_mdl = my_model), 
    .formulas = log(Count) ~ Time
  )

returns a mable consisting of columns Sensor, ets, arima, and new_mdl.

The rest workflow remains as is.

This top-level function batch() always returns a mable. It could be powered by a low-level function, namely key_map() (mapping arbitrary functions to each key), which will be useful for other tasks, for example computing time series features.

We can think about the representation of model messages/warnings/errors later.

mitchelloharawild commented 5 years ago

tbl_ts %>%
  model(ets = safely(ETS)(log(Count) ~ Time))

earowang commented 5 years ago

tbl_ts %>%
  model(ets = safely(ETS)(log(Count) ~ Time), ets2 = ETS(log(Count) ~ season("A")))

earowang commented 5 years ago

A quick summary of what we discussed.

estimate(.data, ...) (future_estimate(.data, ...) for paralleled version?):

.data: A tsibble.
...: a set of name-value pairs

pedestrian %>%
  estimate(ets = ETS(log(Count) ~ season("A")), arima = safely(ARIMA)(log(Count) ~ Time))

earowang commented 5 years ago

Can you also increment the fable/fablelite version to 0.0.9100 to reflect the breaking change?

mitchelloharawild commented 5 years ago

I think your interface is a little off.

pedestrian %>%
  model(ets = ETS(log(Count) ~ season("A")), arima = safely(ARIMA)(log(Count) ~ Time))

my_ets <- ETS(log(Count) ~ season("A"))
train(my_ets, pedestrian)

Internally, model will call train (which may call prepare). It is recommended that users only use the model approach to get results, however the lower level techniques are still available for transparency.

earowang commented 5 years ago

K.

mitchelloharawild commented 5 years ago

pedestrian %>% 
  model(
    ets = ETS(Count ~ season("A")),
    stlm = dcmp_mdl(
      decomposition = STL(Count ~ season(window = 10)), 
      ETS(seasadj), # model dots 1
      SNAIVE(seasonal) # model dots 2
    )
  )

mitchelloharawild commented 5 years ago

fable possible approach

decomposition <- function(.f, ...){
  function(data){
    .f(data, ...)
  }
}

pedestrian %>% 
  model(
    ets = ETS(Count ~ season("A")),
    stlm = decomposition_model(
      decomposition = decomposition(STL, Count ~ season(window = 10)), 
      ..1 = ETS(seasadj ~ seasonal("N")),
      ..2 = SNAIVE(seasonal)
    )
  )

This also works by defining the model earlier

my_stl_model <- decomposition(STL, Count ~ season(window = 10) %>%
  decomposition_model(
  ..1 = ETS(seasadj ~ seasonal("N")),
  ..2 = SNAIVE(seasonal)
)

pedestrian %>% 
  model(
    ets = ETS(Count ~ season("A")),
    stlm = my_stl_model
  )

Forecast package approach

library(forecast)
ts(pedestrian$Count, frequency = 24) %>% 
  {list(
    ets = ets(., "ZZA"),
    stlm = stlm(., method = "ets")
  )}

mitchelloharawild commented 5 years ago

# Pseudocode for decomposition modelling
decomposition_model <- function(decomposition, ...){
  new_model_definition(
    decomposition = decomposition
    models = list_dots(...)
    class = "dcmp_mdl"
  )
}

train.dcmp_mdl <- function(model, data, ...){
  dcmp <- model$decomposition(data)
  models <- map(decomposition$models, function(model){
    train(model, dcmp)
  })

  combination_method(dcmp)(models)
}

Alternatively...

# Pseudocode for decomposition modelling
decomposition <- function(.f, formula, ..., .f_args = list()){
  dcmp_fn <- function(data){
    .f(data, formula, !!!.f_args)
  }
  new_model_definition(
    decomposition = dcmp_fn
    models = list_dots(...)
    class = "dcmp_mdl"
  )
}

train.dcmp_mdl <- function(model, data, ...){
  dcmp <- model$decomposition(data)
  models <- map(decomposition$models, function(model){
    train(model, dcmp)
  })

  combination_method(dcmp)(models)
}

my_stl_model <- decomposition(STL, list(Count ~ season(window = 10)),
                              ..1 = ETS(seasadj ~ seasonal("N")),
                              ..2 = SNAIVE(seasonal)
                              )

pedestrian %>% 
  model(
    ets = ETS(Count ~ season("A")),
    stlm = my_stl_model
  )

mitchelloharawild commented 5 years ago

On an unrelated topic... How do we define the response?

Issue:

ETS(log(GDP/CPI)) ~ Time)

What is the response? GDP? CPI? Currently defaults to the first variable (GDP).

Proposed solution: vars()

ETS(log(GDP/vars(CPI)) ~ Time)
ETS(log(vars(GDP/CPI)) ~ Time)
ETS(vars(log(GDP/CPI)) ~ Time)

This can also be used to specify multivariate models.

VAR(log(vars(GDP, CPI)) ~ Time)

mitchelloharawild commented 5 years ago

model() should only allow models with the same response variable.

mitchelloharawild commented 5 years ago

I'm now thinking that these changes should also decompositions, as the current method of applying decompositions in batch contains the same has similar issues. I still think that decomposition functions should immediately return the decomposition itself, although I'm unsure of an appropriate API for this.

tidyverts / fable

Revised modelling API #77

fable possible approach

Forecast package approach