Model selection and forecasting

slakeon commented 7 years ago

Hi,

I am new to MIDAS and R. I am trying to understand how select_and_forecast works. From my understanding the command selects the best lag length for the high frequency variable by IC minimization within a class of restrictions and then goes on to test pseudo out of sample for the best selected restrictions. Am I right?

In my case I would like to estimate the following model:

midas

Y (low frequency) would be a monthly variable (with 63 total observations available) and X (high frequency) would be a weekly variable that I am assuming is observed 4 times each time Y is observed.

I read the manual, but I couldn't figure out how incorporate the dependent variable in the model selection process without including the contemporaneous term and take into account the fact that I would need to include the dynamic pseudo out of sample forecasts when computing the accuracy measures.

On a side note, is it possible to calculate truly out of sample forecast averages like in EViews (see here). In my particular case, I need to calculate one step and two step forecasts (observations 64 and 65). Is that possible with average_forecast?

vzemlys commented 7 years ago

What have you tried? Can you provide some code? I fixed one bug so select_and_forecast works with ADL models. The following code now works:

### Sets a seed for RNG ###
set.seed(1001)  
## Number of low-frequency observations
n<-250
## Linear trend and higher-frequency explanatory variables (e.g. quarterly and monthly)
trend<-c(1:n)
x<-rnorm(4*n)
z<-rnorm(12*n)
## Exponential Almon polynomial constraint-consistent coefficients
fn.x <- nealmon(p=c(1,-0.5),d=8)
fn.z <- nealmon(p=c(2,0.5,-0.1),d=17)
## Simulated low-frequency series (e.g. yearly)
y<-2+0.1*trend+mls(x,0:7,4)%*%fn.x+mls(z,0:16,12)%*%fn.z+rnorm(n)
##Do not run

cbfc<-select_and_forecast(y~trend+mls(y,0,1)+mls(x,0,4)+mls(z,0,12),
                          from=list(y=1, x=c(4),z=c(12)),
                          to=list(y=cbind(1,5),x=cbind(14,19),z=cbind(22,27)),
                          insample=1:200,outsample=201:250,
                          weights=list(x=c("nealmon"),z=c("nealmon"),y=""),
                          wstart=list(nealmon=c(1,0,0)),
                          IC="AIC",
                          seltype="restricted",
                          ftype="fixed",
                          measures=c("MSE","MAPE","MASE"),
                          fweights=c("EW","BICW","MSFE","DMSFE")

You will need to install the development version of midasr for that: devtools::install_github('mpiktas/midasr').

I do not understand your question about the truly out of sample forecasts. The forecast are either out-of-sample or they are not, there is no in-between state. Could you provide some examples, with code preferably?

slakeon commented 7 years ago

I do not understand your question about the truly out of sample forecasts. The forecast are either out-of-sample or they are not, there is no in-between state. Could you provide some examples, with code preferably?

Sure!

By pseudo out of sample, I mean the part of the sample that is not included in the estimation process and that you have true values to compare your forecasts. By "truly out of sample", I'm referring to the case when you produce a forecast, but you still don't have any true value to compare your forecast.

Let me explain myself with a slight modification of your code above.

In the code below I have ignored "z" and introduced another type of weighting function: lcauchyp.

### Sets a seed for RNG ###
set.seed(1001)  
## Number of low-frequency observations
n<-250
## Linear trend and higher-frequency explanatory variables (e.g. quarterly and monthly)
trend<-c(1:n)
x<-rnorm(4*n)
z<-rnorm(12*n)
## Exponential Almon polynomial constraint-consistent coefficients
fn.x <- nealmon(p=c(1,-0.5),d=8)
fn.z <- nealmon(p=c(2,0.5,-0.1),d=17)
## Simulated low-frequency series (e.g. yearly)
y<-2+0.1*trend+mls(x,0:7,4)%*%fn.x+mls(z,0:16,12)%*%fn.z+rnorm(n)
##Do not run

cbfc<-select_and_forecast(y~trend+mls(y,0,1)+mls(x,0,4),
                          from=list(y=1, x=c(4)),
                          to=list(y=cbind(1,5),x=cbind(14,19)),
                          insample=1:200,outsample=201:250,
                          weights=list(x=c("nealmon","lcauchyp"),y=""),
                          wstart=list(nealmon=c(1,0,0),lcauchyp=rep(1,3)),
                          IC="AIC",
                          seltype="restricted",
                          ftype="fixed",
                          measures=c("MSE","MAPE","MASE"),
                          fweights=c("EW","BICW","MSFE","DMSFE"))
cbfc

The first part of the output produced by select_and_forecast would be this:

$accuracy
$accuracy$individual
                                                  Model MSE.out.of.sample MAPE.out.of.sample MASE.out.of.sample MSE.in.sample MAPE.in.sample
1  y ~ trend + mls(y, 1L, 1) + mls(x, 4:14, 4, nealmon)          1.835865           4.489877          0.8174640      1.797563       13.91430
2 y ~ trend + mls(y, 1L, 1) + mls(x, 4:16, 4, lcauchyp)          1.799635           4.458662          0.8099575      1.760769       13.61808
  MASE.in.sample
1      0.6998895
2      0.6914150

$accuracy$average
  Horizon Scheme      MSE     MAPE      MASE
1       1     EW 1.817212 4.474270 0.8137108
2       1   BICW 1.817179 4.474241 0.8137039
3       1   MSFE 1.817031 4.474114 0.8136734
4       1  DMSFE 1.816853 4.473960 0.8136363

select_and_forecast found the two "best" models for each type of restriction, both able to forecast 1 step ahead.

Now, suppose I need to forecast what the 251st observation of "y" would look like. I could re estimate those models with the full sample like this:

bm1 <- midas_r(y ~ trend + mls(y, 1, 1) + mls(x, 4:14, 4, nealmon), start = list(x = c(1,0,0)))
bm2 <- midas_r(y ~ trend + mls(y, 1, 1) + mls(x, 4:16, 4, lcauchyp), start = list(x = rep(1,3)))

forecast(bm1, list(x = rep(NA, 4), trend = 251))
forecast(bm2, list(x = rep(NA, 4), trend = 251))

From the analysis above, the wiser choice would be to use the forecast from the log-Cauchy model. But if I want to produce a forecast for the 251st observation of "y" that is a weighted average of the nealmon and lcauchyp models, is that possible with average_forecast or any other command, either using EW, BICW, MSFE or DMSFE weights?

vzemlys commented 7 years ago

I've fixed a small bug, which prevented using average_forecast when outsample is only one element. Now the following

average_forecast(list(bm1,bm2), data=list(y=c(y,NA),x=c(x,rep(NA,4)),trend=c(trend,251)),insample=1:250,outsample=251)

should work. Note that since weights MSFE and DMSFE depend on the out-of-sample MSE, the result is NA, since the the out-of-sample data is NA.

slakeon commented 7 years ago

Thank you very much! Will try that!

nriquec commented 6 years ago

I have the same trouble. This is my code, but when I run "average_forecast" the results are NA. The code are the next. model1=midas_r(TC ~ mls(TC, 1, 1)+ mls(EMCI_MXN, 2:5, 4, nealmon) , data=list(dlog_men, EMCI_MXN=dlog_dS$EMCI_MXN), start = list(EMCI_MXN=rep(0,3)))

model2=midas_r(TC ~ mls(TC, 1, 1)+ mls(EMCI_MXN, 2:5, 4, lcauchyp) , data=list(dlog_men, EMCI_MXN=dlog_dS$EMCI_MXN), start = list(EMCI_MXN=c(1,1,1)))

average_forecast(list(model1, model2),data=list( TC=c(dlog_men$TC, NA), EMCI_MXN =c(dlog_dS$EMCI_MXN,rep(NA,4)) ), insample= 1:146, outsample = 147 )

vzemlys commented 6 years ago

@nriquec please open a new issue and post a reproducible example with information which R version and midasr version you are using.

nriquec commented 6 years ago

@vzemlys I did it, thank you !

syrop87 commented 6 years ago

I just tried to run the code uploaded above by @slakeon with a line added by @vzemlys and a minor change so that select_forecast runs without error (deleting ,'y=""' from option weights). Unfortunately I get a following error: Error in apply(fc, 1, function(r) sum(r * ww)) : dim(X) must have a positive length

Can anyone help me with that? The reproducible code is added below. I am using R 3.3.1, with midas v.0.6

 set.seed(1001)  
## Number of low-frequency observations
n<-250
## Linear trend and higher-frequency explanatory variables (e.g. quarterly and monthly)
trend<-c(1:n)
x<-rnorm(4*n)
z<-rnorm(12*n)
## Exponential Almon polynomial constraint-consistent coefficients
fn.x <- nealmon(p=c(1,-0.5),d=8)
fn.z <- nealmon(p=c(2,0.5,-0.1),d=17)
## Simulated low-frequency series (e.g. yearly)
y<-2+0.1*trend+mls(x,0:7,4)%*%fn.x+mls(z,0:16,12)%*%fn.z+rnorm(n)
##Do not run

cbfc<-select_and_forecast(y~trend+mls(y,0,1)+mls(x,0,4),
                          from=list(y=1, x=c(4)),
                          to=list(y=cbind(1,5),x=cbind(14,19)),
                          insample=1:200,outsample=201:250,
                          weights=list(x=c("nealmon","lcauchyp")),
                          wstart=list(nealmon=c(1,0,0),lcauchyp=rep(1,3)),
                          IC="AIC",
                          seltype="restricted",
                          ftype="fixed",
                          measures=c("MSE","MAPE","MASE"),
                          fweights=c("EW","BICW","MSFE","DMSFE"))

bm1 <- midas_r(y ~ trend + mls(y, 1, 1) + mls(x, 4:14, 4, nealmon), start = list(x = c(1,0,0)))
bm2 <- midas_r(y ~ trend + mls(y, 1, 1) + mls(x, 4:16, 4, lcauchyp), start = list(x = rep(1,3)))

average_forecast(list(bm1,bm2), data=list(y=c(y,NA),x=c(x,rep(NA,4)),trend=c(trend,251)),insample=1:250,outsample=251)

mpiktas / midasr

Model selection and forecasting #50