mpiktas / midasr

R package for mixed frequency time series data analysis.
http://mpiktas.github.io/midasr/
Other
72 stars 34 forks source link

Consistent warning messages in the midasr package and unrealistic MSE in and out sample values for a regression. #92

Open JuBausch opened 2 months ago

JuBausch commented 2 months ago

I was trying to forecast the RV with an almon lag function in midasr. The problem I am encountering is quite strange, as after fitting two models (one without an external variable and one with an external variable) the MSE in and out of was exploding for the second variable for some reasons, which does not make sense as adding new variables should increase the model fit and therefore decrease MSE. The HAR estmimation works well on the other hand. Here is the code and the respective MSEs:

ts_id <- "O66D"
data_xts <- data_xts[data_xts$rv5 != 0, ]
data_xts$CS <- impute_na_with_neighbors(data_xts$CS)
data_xts$NTFS <- impute_na_with_neighbors(data_xts$NTFS)
data_xts$VIX <- impute_na_with_neighbors(data_xts$VIX)
# Convert 'rv5' to a time series object
tsx_var <- ts(coredata(log(data_xts$rv5^2)), frequency = 252)
tsx_var_vix <- ts(coredata(log(data_xts$VIX)), frequency = 252)
tsx_var_vixxx <- tsx_var_vix
tsx_var_cs <- ts(coredata(log(data_xts$CS)), frequency = 252)
tsx_var_ntfs <- ts(coredata(data_xts$NTFS), frequency = 252)
# Convert the time series identifier to a time series object
tsy_var <- ts(coredata(log(data_xts[, ts_id]^2)), frequency = 252)

nealmon_model_DJINET_O66d <- midas_r(tsy_var ~ mls(tsx_var, 1:22, 1, nealmon),
                                  start = list(tsx_var = c(1, -0.5)), weight_gradients = list())

nealmon_model_vix_DJINET_O66d <- midas_r(tsy_var ~ mls(tsx_var, 1:22, 1, nealmon) + mls(tsx_var_vix, 1:22, 1, nealmon),
                                      start = list(tsx_var = c(1, -0.5), tsx_var_vix = c(1, -0.5)), weight_gradients = list())

forecast_DJINET_66d <- average_forecast(list(nealmon_model_DJINET, nealmon_model_vix_DJINET_O66d),
                                     data = list(tsx_var = tsx_var, tsy_var = tsy_var, tsx_var_vix = tsx_var_vix, tsx_var_cs = tsx_var_cs, tsx_var_ntfs = tsx_var_ntfs),
                                     insample = 1:end_sample, outsample = out_sample_start:length(tsx_var),
                                     type = "fixed", show_progress = FALSE)
#after that i got the following Warning messages:
Warning Messages:
1: In (function (x, nm)  :
  Duplicate names in data. Using the one from the list
2: In (function (x, nm)  :
  Duplicate names in data. Using the one from the list
3: In (function (x, nm)  :
  Duplicate names in data. Using the one from the list
4: In (function (x, nm)  :
  Duplicate names in data. Using the one from the list
5: In (function (x, nm)  :
  Duplicate names in data. Using the one from the list

forecast_DJINET_66d[["accuracy"]][["individual"]][["MSE.out.of.sample"]]
[1] 0.6400151 5.8904797
forecast_DJINET_66d[["accuracy"]][["individual"]][["MSE.in.sample"]]
[1] 0.805655 6.367892

#recalculating the MSE for the whole sample models with and without the external variable:
> mean(nealmon_model_DJINET_O66d[["residuals"]]^2)
[1] 0.3805153
> mean(nealmon_model_vix_DJINET_O66d[["residuals"]]^2)
[1] 0.3653933

I am not sure how this error can take place, if its due to the warning or a missspecification, but the lower MSE seems impossible to me. If someone could help, it would be much appreciated!

vzemlys commented 2 months ago

Please post the data, as I cannot reproduce the problem without the data.

The warning comes from here: https://github.com/mpiktas/midasr/blob/649522268e99129de65d23134e12a719aa87e978/R/midas_r_methods.R#L599.

If you pass as data the object which have columns the column names are ignored and the name is taken from the list. To avoid that, if you want to pass data via named list, and the data is only one column, please pass it as a vector.