mpiktas / midasr

R package for mixed frequency time series data analysis.
http://mpiktas.github.io/midasr/
Other
73 stars 34 forks source link

'Missing variables in newdata' error (including reproducible example) #66

Closed pvdmeulen closed 2 years ago

pvdmeulen commented 6 years ago

Error (see reproducible example below)

For the past few days I have been having issues with the average_forecast function in midasr. The error in question is:

Error in point_forecast.midas_r(mod, newdata = outdata, method = "static") : Missing variables in newdata. Please supply the data for all the variables (excluding the response variable) in regression

May have found the issue, probably a bug with midas_u and/or defining MIDAS lag structures beforehand. Code works for now with slight tweaks. See below.

Context

For context, I select the same time window for both my quarterly indepedent variable y and monthly regressors x_1, x_2, and x_3, and there are no missing values (56 quarterly observations and 168 monthly observations). My complete dataset y_ts etc. starts at 2004 Q1 (M1 for x) and runs to 2018 Q1 (M6 for x).

I split the whole sample into a training sample and a testing sample:

y <- window(y_ts, start = c(2004, 02), end = c(2012, 01))
x_1 <- window(x1_ts, start = c(2004, 04), end = c(2012, 03))
# similarly for x_2, x_3

wholesample <- list(y = window(y_ts, start = c(2004, 02), end = c(2018, 01)), 
                                 x_1 = window(x1_ts, start = c(2004, 04), end = c(2018, 03)),
                                 x_2 = window(x2_ts, start = c(2004, 04), end = c(2018, 03)),
                                 x_3 = window(x3_ts, start = c(2004, 04), end = c(2018, 03)))

sample1 <- 1:(length(y))
sample2 <- (1:length(wholesample$y))[-sample1]

So, my training sample runs from 2004 to 2012 and contains 32 observations, and the full sample contains 56 (in the quarterly index). Then, I create MIDAS lag structures:

library(midasr)
MY <- mls(y, k = 1, m = 1)
MX1 <- fmls(x_1, k = 1, m = 3)

and estimate midas_u models:

u1 <- midas_u(y ~ MX1)
u11 <- midas_u(y ~ MX1 + MY)

Then, just running this:

avgfc <- average_forecast(list(u1, u11), data = wholesample, insample = sample1, outsample = sample2, type = c("fixed"), measures = "MSFE", fweights = "MSE", show_progress = TRUE)

results in the error:

Error in eval(as.name("ee"), object$Zenv) : object 'ee' not found

and so I define ee = 1 before running average_forecast again, resulting in the error in the title. Does anyone know where I went wrong? There must be a mismatch of data somewhere that I'm not seeing. I have tried other arbitrary values for ee as well.

(hopefully this is relatively easy to reproduce with generic variable names)

Thanks, any help is appreciated :)

Edit:

I get the same issue when defining lists containing the training sample and the testing sample, like so:

trainingsample <- list(y, x_1, x_2, x_3)
testingsample <- list(y = window(y_ts, start = c(2012, 02), end = c(2018, 01)), 
                                x_1 = window(x1_ts, start = c(2012, 04), end = c(2018, 03)),
                                x_2 = window(x2_ts, start = c(2012, 04), end = c(2018, 03)),
                                x_3 = window(x3_ts, start = c(2012, 04), end = c(2018, 03)))

and using the forecast function in midasr:

forecast(u1, newdata = testingsample, method = c("static"), insample = trainingsample)

Edit 2:

Estimating the unrestricted MIDAS models using the restricted MIDAS estimation function midas_r with start = NULL (as in the user guide) results in an error as well:

r11 <- midas_r(y ~ MX1 + MY, start = NULL)

Error in x$model[, -1] %*% x$midas_coefficients : non-conformable arguments

Edit 3:

Forecasting with the predict.lm function works fine (using the same variables as the example below) for some reason:

simplefc <- predict(u11test, newdata = newsampletest, interval = c("confidence", "prediction"),
        level = 0.95, type = c("response", "terms"),
        terms = NULL, na.action = na.pass,
        pred.var = res.var/weights, weights = "MSFE")

simplefc

EDIT 4: FOUND THE ISSUE!

The code works fine when using midas_r (with start = NULL) and defining lag structures inside the regression instead of beforehand.

u1test <- midas_r(midas_y ~ fmls(midas_x, k = 1, m = 3), start = NULL)
u11test <- midas_u(midas_y ~ fmls(midas_x, k = 1, m = 3) + mls(midas_y, 1, 1), start = NULL)

The 'ee' issue still shows up when running midas_u and omitting start = NULL. Bug?

Reproducible example:

library(zoo)

y <- arima.sim(model = list(NULL), n = 128)
x <- arima.sim(model = list(NULL), n = 174)

yts <- ts(y, start = as.yearqtr(c(2004, 2)), end = as.yearqtr(c(2018, 1)), frequency = 4) #quarterly
xts <- ts(x, start = as.yearmon(c(2004, 2)), end = as.yearmon(c(2018, 7)), frequency = 12)  #monthly

# Selecting the same time window for both variables:
midas_y <- window(yts, start = c(2004, 02), end = c(2010, 01))
midas_x <- window(xts, start = c(2004, 04), end = c(2010, 03))

# Create two samples:

trainingsampletest <- list(midas_y, midas_x)

wholesampletest <- list(midas_y = window(yts, start = c(2004, 02), end = c(2018, 01)), 
                    midas_x = window(xts, start = c(2004, 04), end = c(2018, 03)))

newsampletest <- list(midas_y = window(yts, start = c(2010, 02), end = c(2018, 01)), 
                  midas_x = window(xts, start = c(2010, 04), end = c(2018, 03)))

# Length of each sample:
sample1test <- 1:(length(midas_y))
sample2test <- (1:length(wholesampletest$midas_y))[-sample1test]

# Create MIDAS lag structures:
library(midasr)
MY1 <- mls(midas_y, 1, 1)
MX1 <- fmls(midas_x, k = 1, m = 3)

u1test <- midas_u(midas_y ~ MX1)
u11test <- midas_u(midas_y ~ MX1 + MY1)
# Models output coefficients fine, the error shows up in the next part.

# Average forecasts for two models:

avgfc <- average_forecast(list(u1test, u11test), data = wholesampletest, insample = sample1test, outsample = sample2test, type =  c("fixed"), show_progress = TRUE)
vzemlys commented 5 years ago

Sorry for late reply. Use the formula interface:

midas_u(y~mls(y,1,1)+fmls(x, 1, 3))

instead of

midas_u(y~MX31+MY)

andresxmv commented 3 years ago

I have the problem with forecast function to the new data: Error in point_forecast.midas_r(object, newdata = newdata, method = method, : Missing variables in newdata. Please supply the data for all the variables (excluding the response variable) in regression

aroaballesteros commented 2 years ago

Did you solve the problem? I have the same problem and i do not know how to fix it, thank you very much

pvdmeulen commented 2 years ago

If it's the 'missing variables in newdata' issue you're referring to, see my fourth edit. Not sure if the package has changed since I last encountered this so your milage may vary!

vzemlys commented 2 years ago

When using forecast make sure that all the variables which are present in the formula interface exist in the data which is passed into newdata argument.

lwandaka commented 2 years ago

Good Morning, I have having the same problem as stated above

vzemlys commented 2 years ago

See my last comment