gsucarrat / gets

R Package for General-to-Specific (GETS) modelling and Indicator Saturation (ISAT) methods
8 stars 5 forks source link

Dealing with issues when too many indicators are retained #70

Open moritzpschwarz opened 1 year ago

moritzpschwarz commented 1 year ago

Examples of the problem

Consider our discussion here #35 and all of these examples:

set.seed(123)
y <- rnorm(20)
k = 15
xvar <- matrix(rnorm(20*k), ncol = k)

# it's not a good model - but we still need to be able to deal with it
arx(y = y, mxreg = xvar, plot = TRUE) 
summary(lm(y ~ xvar)) # also works with lm

isat(y = y, mxreg = xvar, max.block.size = 30) 

isat(y = y, mxreg = xvar, max.block.size = 10) 

isat(y = y, mxreg = xvar, max.block.size = 1, plot = TRUE) 

isat(y = y, mxreg = xvar, max.block.size = 1, ar.LjungB = NULL,arch.LjungB = NULL, wald.pval = NULL, t.pval = 0.0001, plot = TRUE) 

isat(y = y, mxreg = xvar, max.block.size = 1, ar.LjungB = NULL,arch.LjungB = NULL, wald.pval = NULL, t.pval = 0.00000001) 

# or also those
data(Nile)
isat(Nile, sis = TRUE, iis = TRUE, plot = TRUE, t.pval = 0.9, print.searchinfo = FALSE)
isat(Nile, sis = TRUE, iis = TRUE, plot = TRUE, t.pval = 0.9, ar = 1:3, print.searchinfo = FALSE)

For all of these, we get error messages that resemble something like this:

Error in if (ar.LjungBox$p.value <= ar.LjungB[2]) { : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In pt(abs(gum.tstat), est$df, lower.tail = FALSE) : NaNs produced
2: In pt(abs(gum.tstat), est$df, lower.tail = FALSE) : NaNs produced
3: In pt(abs(t.stat), out$df, lower.tail = FALSE) : NaNs produced

Changes to address the problem

Now I have implemented an additional block search at the end of each indicator search, see here:

So after all e.g. IIS blocks are done, there is a check requiring the number of x-variables to be smaller than the number of observations. If there are more x-variables, the non-kept variables (so the indicators) are divided into blocks again and an additional selection is carried out. For this, I use a Leave-one-Out method.

See the review at the bottom for a detailed description of the changes

New functions to help restructure isat()

I have finally changed the isat() structure a bit. There is now a create.ISMatrices() function and a ISMatricesLoop() function. This means that isat() is now much shorter and there is no more function definition within isat.

Currently still a draft

All tests pass

This is currently still missing the documentation for the three new functions ISblocksFun(), create.ISmatrices(), ISadditionalblocksearch()

NOTE

I have found that in the in getsFun function, the GUM is included by default. In getsm it is not.

moritzpschwarz commented 1 year ago

Still to do