tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Using stretch_tsibble() for CV, causes warnings... #331

Closed Steviey closed 2 years ago

Steviey commented 2 years ago

Ubuntu 16.x LTS, R latest, fabletools latest

Trying this example from Prof. Hyndman:

https://robjhyndman.com/hyndsight/tscv-fable/

...causes warnings, using stretch_tsibble() for cross validation.

Code: `if (requireNamespace("fable", quietly = TRUE)) { library(fable) library(tsibble) library(tsibbledata) library(dplyr)

beer <- aus_production %>%
dplyr::select(Beer) %>%
tsibble::stretch_tsibble(.init = 12, .step=1)

fc <- beer %>%
fabletools::model(ETS(Beer)) %>%
fabletools::forecast(h = "1 year") %>%
dplyr::group_by(.id) %>%
dplyr::mutate(h = row_number()) %>%
dplyr::ungroup()

acc<-  fc %>%
fabletools::accuracy(aus_production, by="h") %>%
dplyr::select(h, RMSE)

print(acc)
stop()

}`

Message: Warning: Accuracy measures should be computed separately for each model, have you forgotten to add ".model" to your by argument? Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 4 observations are missing between 2010 Q3 and 2011 Q2

robjhyndman commented 2 years ago

I think this is probably due to a recent change in the package. I've updated the blog post to use:

fc %>%
  accuracy(aus_production, by=c("h",".model")) %>%
  select(h, RMSE)
Steviey commented 2 years ago

Thank you Prof. Hyndman,

I have to stretch a little less (tsibble::stretch_tsibble(.init = 200, .step=1))...because my PC is too slow.

The first message disapears with your modification. But could it be, that the second warning remains?

The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 4 observations are missing between 2010 Q3 and 2011 Q2

robjhyndman commented 2 years ago

The data finishes in 2010 Q2 but you are making forecasts to 2011 Q2.

Steviey commented 2 years ago

I do not understand how the out-of-sample data is defined in a data set- especially when I have to forecast or cross validate days. Should we use the resulting fit for "real world forecasts" (1 step ahead), for example via refit and how would this have been done? In machine learning cv would be part of training a model. I'm not sure, if this is the intention here too, althoug I read this article: https://robjhyndman.com/hyndsight/tscv/

Lets say I have a model combination:

myFit <- preData %>% fabletools::model( ets = ETS(!!varname) ,arima = ARIMA(!!varname) ) %>% mutate( average = (ets + arima ) / 2 )

With a normal forecast I would see, that an average would provide best accuracy. But for what should I incorporate cross validation?

robjhyndman commented 2 years ago

Please ask such questions at crossvalidated.com. This site is just for reporting issues with the package.

Steviey commented 2 years ago

Thank you Prof. Hyndman. After some research, I understand it better (again).