robjhyndman / forecast

Forecasting Functions for Time Series and Linear Models
http://pkg.robjhyndman.com/forecast
1.12k stars 342 forks source link

forecast with object, xreg, and arima xreg model fails #682

Closed blakeflei closed 6 years ago

blakeflei commented 6 years ago

Calling forecast on a time series object, xreg, and arima xreg model seems to fail. The goal is: 1 - Split all data into into separate train and test data. 2 - Fit the train set to create train model. 3 - Use the test set to determine error (skipped here for brevity). 4 - Apply the train model to all data to create forecast.

Example:

> library(forecast)
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] forecast_8.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16      magrittr_1.5      uroot_2.0-9      
 [4] munsell_0.4.3     colorspace_1.3-2  lattice_0.20-35  
 [7] rlang_0.2.0       quadprog_1.5-5    TTR_0.23-3       
[10] plyr_1.8.4        tools_3.5.0       xts_0.10-2       
[13] nnet_7.3-12       parallel_3.5.0    quantmod_0.4-13  
[16] grid_3.5.0        nlme_3.1-137      timeDate_3043.102
[19] gtable_0.2.0      urca_1.3-0        tseries_0.10-44  
[22] lazyeval_0.2.1    lmtest_0.9-36     tibble_1.4.2     
[25] ggplot2_2.2.1     curl_3.2          fracdiff_1.4-2   
[28] compiler_3.5.0    pillar_1.2.1      scales_0.5.0     
[31] zoo_1.8-1  

Generate synthetic data. Forecasts using arima with xreg will be attempted on the first column:

> num_rows <- 100
> num_cols <- 3
> 
> dat <- matrix(abs(rnorm(num_rows)))
> dat[2:length(dat),1] <- dat[2:length(dat)] + dat[1:(length(dat)-1)]*2 
> 
> df_in <- matrix(data=NA, nrow=num_rows, ncol=num_cols)
> df_in[,1] <- dat
> df_in[,2] <- c(tail(dat, length(dat) - 2), NA, NA) + rnorm(num_rows, 0, 0.2) 
> df_in[,3] <- c(tail(dat, length(dat) - 3), NA, NA, NA)  + rnorm(num_rows, 0, 0.2)
> df_in <- df_in[apply(df_in, 1, function(x) !any(is.na(x))), ] # Drop NA

> forecast_window <- 5  

Split into train and test data:

> val_test_perc <- 0.2
> 
> split_ind <- floor((1-val_test_perc) * dim(df_in)[1])
> train <- df_in[seq_len(split_ind),]
> test <- df_in[-seq_len(split_ind),]

Determine arima with xreg model using train data:

> fit_arima_xreg <- auto.arima(train[,1], xreg=train[,-1])

Forecast all exogenous variables using train models on all data:

# First create exog train models using train data:
> exog_models <- list()
> for (i in 2:num_cols){
+     exog_models[[i]] <- auto.arima(train[,i])
+ }
> 
> # Then forecast exogs using all data and exog train models:
> exog_forcs <- data.frame(matrix(data=NA, nrow=forecast_window, ncol=num_cols))
> for (i in 2:num_cols){
+     exog_curr_forecast <- forecast(df_in[,i], 
+                                    model=exog_models[[i]], 
+                                    h=forecast_window) 
+     exog_forcs[,i] <- exog_curr_forecast[['mean']]
+ }

Attempt to forecast using all data:

> forecast(df_in[,1], model=fit_arima_xreg, xreg=exog_forcs[,-1], h=forecast_window)
Error in stats::arima(x = x, order = order, seasonal = seasonal, xreg = xreg,  : 
  lengths of 'x' and 'xreg' do not match

Only the model seems to work:

forecast(fit_arima_xreg, xreg=exog_forcs[,-1], h=forecast_window)
   Point Forecast      Lo 80    Hi 80      Lo 95    Hi 95
78       2.136355  0.8962271 3.376483  0.2397431 4.032967
79       3.249901  1.7886203 4.711181  1.0150650 5.484737
80       2.897088  1.4199463 4.374230  0.6379947 5.156182
81       1.710781  0.1407135 3.280849 -0.6904302 4.111992
82       1.397953 -0.2342081 3.030114 -1.0982220 3.894128

Attempt to forecast train data (should be same as above):

> forecast(train[,1], model=fit_arima_xreg, xreg=exog_forcs[,-1], h=forecast_window)
Error in stats::arima(x = x, order = order, seasonal = seasonal, xreg = xreg,  : 
  lengths of 'x' and 'xreg' do not match
blakeflei commented 6 years ago

That was probably way too complicated.

A simpler example:

> library(forecast)
> dat <- matrix( c(1.56, 0.12, 0.44, 3.24, 0.64, 0.79, 0.46, 
+                  0.41, 0.91, 0.71, 0.66, 2.97, 0.56, 3.25, 3.62),
+                nrow=5, ncol=3)
> 
> ex <- matrix( c(2.84, 1.41, 2.78, 2.08, 2.41, 2.73, 2.57, 
+                 2.73), 
+               nrow=4, ncol=2)
> 
> fit <- auto.arima(dat[,1], xreg=dat[,-1])
> 
> forecast(dat[,1], model=fit, xreg=ex, h=4)
Error in stats::arima(x = x, order = order, seasonal = seasonal, xreg = xreg,  : 
  lengths of 'x' and 'xreg' do not match
robjhyndman commented 6 years ago

Presumably you want to do this:

forecast(fit, xreg=ex, h=4)

The first argument of forecast should be a model. If you apply forecast directly to data, then it will try to figure out what you meant. Here is is trying to fit the model to the data which has been passed. But then there are only 4 xreg rows but 5 observations, so it fails.

blakeflei commented 6 years ago

If four observations and xreg rows are used, forecast reports there are no regressors:

> dat_update <- matrix(c(0.078, 1.69, 4.76, 3.41),
+                      nrow=4, ncol=1)
> 
> forecast(dat_update, model=fit, xreg=ex, h=4)
Error in forecast.Arima(fit, h = h, level = level, fan = fan) : 
  No regressors provided
blakeflei commented 6 years ago

How should one forecast using updated data by applying an xreg arima model trained on previous data?

For example:

> dat_update <- matrix(c(dat[,1], 0.078, 1.69, 4.76, 3.41),
+                      nrow=9, ncol=1) #Updated data
> 
> forecast(dat_update, model=fit, xreg=ex, h=4) #Apply previous fit to updated data
Error in stats::arima(x = x, order = order, seasonal = seasonal, xreg = xreg,  : 
  lengths of 'x' and 'xreg' do not match

The docs currently state the first argument could be a time series. This seems to work for arima models, but fails for xreg arima models.

robjhyndman commented 6 years ago

To apply a model to a new data set, first construct the model object, then forecast. Like this

fit2 <- Arima(dat[,2], model=fit, xreg=dat[,-1])
forecast(fit2, xreg=ex)
blakeflei commented 6 years ago

Thank you for the clarification! I missed the model creation step.