mpiktas / midasr

R package for mixed frequency time series data analysis.
http://mpiktas.github.io/midasr/
Other
73 stars 34 forks source link

incorrect average_forecast values for midas_regressions #93

Open JuBausch opened 2 months ago

JuBausch commented 2 months ago

Hi dear community, I am facing a problem that although i am including more variables in an almon lag midas model the in sample MSE is getting greater than it should, ie its greater than similar models with less external variables.

I am using the following code

impute_na_with_neighbors <- function(x) {
  n <- length(x)
  for (i in 1:n) {
    if (is.na(x[i])) {
      if (i == 1 || i == n) {
        next  # Skip the first and last elements
      } else {
        if (!is.na(x[i-1]) && !is.na(x[i+1])) {
          x[i] <- mean(c(x[i-1], x[i+1]), na.rm = TRUE)
        }
      }
    }
  }
  return(x)
}
# Read the XTS file
# Specify the dataset and time series identifier
dataset <- "RUT"
ts_id <- "O66D"  # Adjust this to the desired time series identifier

# Base path for the XTS files
base_path <- "~----"

# Construct the file path for the XTS file
file_path <- paste0(base_path, dataset, "/", dataset, "_xts.rds")

# Read the XTS file
data_xts <- readRDS(file_path)

# Ensure 'rv5' does not contain zeros (if needed)
data_xts <- data_xts[data_xts$rv5 != 0, ]
data_xts$CS <- impute_na_with_neighbors(data_xts$CS)
data_xts$NTFS <- impute_na_with_neighbors(data_xts$NTFS)
data_xts$VIX <- impute_na_with_neighbors(data_xts$VIX)
# Convert 'rv5' to a time series object
tsx_var <- ts(coredata(log(data_xts$rv5^2)), frequency = 252)
tsx_var_vix <- ts(coredata(log(data_xts$VIX)), frequency = 252)
tsx_var_cs <- ts(coredata(log(data_xts$CS)), frequency = 252)
tsx_var_ntfs <- ts(coredata(data_xts$NTFS), frequency = 252)
# Convert the time series identifier to a time series object
tsy_var <- ts(coredata(log(data_xts[, ts_id]^2)), frequency = 252)

nealmon_model_RUT_O66d <- midas_r(tsy_var ~ mls(tsx_var, 1:22, 1, nealmon),
                                        start = list(tsx_var = c(1, -0.5)), weight_gradients = list())

nealmon_model_all_RUT_O66d <- midas_r(tsy_var ~ mls(tsx_var, 1:22, 1, nealmon) + mls(tsx_var_vix, 1:22, 1, nealmon) + mls(tsx_var_cs, 1:22, 1, nealmon) + mls(tsx_var_ntfs, 1:22, 1, nealmon),
                                            start = list(tsx_var = c(1, -0.5), tsx_var_vix = c(1, -0.5), tsx_var_cs = c(1, -0.5), tsx_var_ntfs = c(1, -0.5)), weight_gradients = list())

#where print(mean(nealmon_model_RUT_O66d$residuals^2)) = [1] 0.3498169 #and print(mean(nealmon_model_all_RUT_O66d$residuals^2))[1] 0.3417388

#but after creating a forecast we get the following
forecast_RUT_66d <- average_forecast(list(nealmon_model_RUT_O66d, nealmon_model_all_RUT_O66d),,
                                           data = list(tsx_var = tsx_var, tsy_var = tsy_var, tsx_var_vix = tsx_var_vix, tsx_var_cs = tsx_var_cs, tsx_var_ntfs = tsx_var_ntfs),
                                           insample = 1:end_sample, outsample = out_sample_start:length(tsx_var),
                                           type = "fixed", show_progress = FALSE)

forecast_RUT_66d[["accuracy"]][["individual"]][["MSE.in.sample"]]
0.3401665 1.3465425

I am trying to resolve this issue since a few days already but I dont know where the problems can come from. Also, I am not able to apply Ofunction="nls" to my almon lag functions with external variables - whereas in functions with only one variable I can apply it which shrinked mse substantially in the forecast function. Help in this would be very much appreciated.