gsucarrat / gets

R Package for General-to-Specific (GETS) modelling and Indicator Saturation (ISAT) methods
8 stars 5 forks source link

How to resolve error in dropvar: Determination of full column rank design matrix failed #35

Open moritzpschwarz opened 3 years ago

moritzpschwarz commented 3 years ago

Hoping someone can help me with this - and perhaps a motivation for additional checks within the isat function to prevent this ahead of where it occurs.

Problem Description

I'm attempting to use isat using a user-specified indicator uis, which is simply a MIS object of the x-regressors interacted with the sim() function (so a multiplicative matrix to check for breaks in the coefficient). The screenshot below shows a model also using iis and sis, but the same happens when just using uis.

Now admittedly, this is a large sample (t = above 2000) with k = 10 and and ARDL(2) model, so you can imagine that running the uis takes a long time. I run it with a tol = 1e-15 - but using less (e.g. the default 1e-7) also fails with a singular matrix in "a" in "solve" error. Using this, rather than with daily data but with weekly or monthly data, actually works.

Error

Everything works great until the final GETS union of retained UIS variables... where I get the error Error: determination of full column rank design matrix failed.

The error occurs within dropvar in the isat function. The error occurs because:

Hoping to resolve

Obviously this is fairly frustrating as the function fails with the products of its earlier functions (the individual getsFuns on the uis) - and in this case it is after a really intensive estimation procedure, which is difficult to debug.

Ideally I would:

Replicate using my example

I have included a .zip file with

1) the mXis object that is being used in the dropvar function and a tiny replication script (also with the option of using dropvar with a browser command, to walk through the function step-by-step using F10).

library(gets)
load("mXis.RData")
dropvar(mXis, tol = 1e-15, LAPACK = FALSE, silent = FALSE)

replication of error.zip

2) a full save.image() saved just before the GETS union of retained UIS variables... command in isat, containing all data inputs etc.

Thanks all for your help!! :)

moritzpschwarz commented 3 years ago

ZIP file for replication of error.zip

gsucarrat commented 3 years ago

Have you tried using blocksFun() instead of isat()? Remember, blocksFun() returns the specific specification from each GETS-search, but it does not take the union of the final specific models. Here is some example code:

set.seed(123) y <- rnorm(30) x1 <- matrix(rnorm(30*10), 30, 10) x2 <- matrix(rnorm(30*10), 30, 10) blocksFun(y, list(x1,x2))

moritzpschwarz commented 3 years ago

Thanks a lot for your quick reply G-man!! I'll definitely try this, thanks!!

But in a way I'd still want to resolve the original error or at least trace the reason for it. After all when using isat , it should either give me a specific model as the union of specific models or show me an error that will allow me to rectify it...

Any ideas?

gsucarrat commented 3 years ago

My best guess is that the problem does not stem from the dropvar() function. Rather, I think the problem might be an issue in isat(), or in your uis argument. When combining the remaining variables from sis, iis and/or tis, the dropvar() function is applied to ensure invertibility of the regressor matrix before the final GETS is done (this final step is skipped by blocksFun()). Now, in your case, there are probably variables left over from uis as well, and we might not have "ironed out" all issues related to this situation, since putting variables in uis is less common. blocksFun() does not do the last GETS in which your problem occur, so you might be able to see what the issue is by looking at the retained variables of blocksFun(). If not, then you will have to dwell into the details of the isat() function...

gsucarrat commented 3 years ago

A new idea just occurred to me. Have you tried applying dropvar() to the matrix you are putting in uis?

moritzpschwarz commented 3 years ago

A new idea just occurred to me. Have you tried applying dropvar() to the matrix you are putting in uis?

Excellent idea G-man - thanks, I will also try this!!

At the moment, I suspect it could be related to simply retaining too many indicators for the number of observations... With my ARDL(2,2) resulting in 32 co-variates and then the uis MIS set-up (32 times a 2472 x 2472 matrix), I retain a huge number of indicators before the union GETS of the uis variables - where I suspect a sizeable number of Indicators would be dropped again.

I'm currently simply sending the initially retained indicators through another block search (essentially taking all the retained indicators and adding them into another isat call as a new uis, which then of course divides them into blocks and searches again).

If this works, then perhaps we need to think about whether this is a problem that could generally be present in different MIS applications. But at least the solution for such a problem would be fairly straightforward (simply check whether there are too many retained indicators for the number of observations, sending them back into block-search for the final GETS union - and if another block-search still retains too many indicators for the model to work then printing an explicit error message that the model is not well-defined).

Will keep you updated, but really appreciate your help!! :)

gsucarrat commented 3 years ago

I had a peak at the ZIP-file you provided. Out of curiosity, I applied dropvar() on mXis in the file named mXis.RData:

> dim(mXis) [1] 2706 2738 > dropvar(mXis)->tmp regressor-matrix is column rank deficient, so dropping 1901 regressors

So, 1901 columns out of a total of 2738 were dropped due to exact colinearity! Note also that it took a minute or two on my (average) laptop to drop the columns. Now, there might be a "bug" or "issue" somewhere in isat() or dropvar(), but it seems - to me - that the best place to start is in fact to re-design your uis matrix ;)

moritzpschwarz commented 3 years ago

I had a peak at the ZIP-file you provided. Out of curiosity, I applied dropvar() on mXis in the file named mXis.RData:

> dim(mXis) [1] 2706 2738 > dropvar(mXis)->tmp regressor-matrix is column rank deficient, so dropping 1901 regressors

So, 1901 columns out of a total of 2738 were dropped due to exact colinearity! Note also that it took a minute or two on my (average) laptop to drop the columns. Now, there might be a "bug" or "issue" somewhere in isat() or dropvar(), but it seems - to me - that the best place to start is in fact to re-design your uis matrix ;)

Thanks G-Man!

I don't think they were necessarily dropped due to exact collinearity! Indeed, each indicator is a unique coefficient-step dummy that should not be collinear to any other dummy. I think dropvar just cannot deal with more regressors than observations!

Consider this:

set.seed(1)
x <- matrix(rnorm(20), ncol = 5) # more columns than rows
dropvar(x) #one column dropped *as if it were collinear - but isn't!"

It could well be that this is the intended behaviour - G-Man you're better equipped to judge this. I would have expected to receive an error here rather than getting an output that just drops one non-collinear column.

In my particular case, I'm trying to use MIS as my uis - so I'd really like to continue with the GUM approach - add in all coefficient-step interactions and just see what isat returns. This will inevitably return a large(r) number of indicators - in this case, as you point out, even more than the number of observations.

But perhaps this is just an issue that will continue to pop up with MIS applications or very large uis applications. Perhaps this warrants a quick if statement and an explicit error message though.

gsucarrat commented 3 years ago

Interesting! We should probably look into the dropvar() code. Note that we did not write the original function ourselves, see help(dropvar). To understand the details of the function, we need to understand the QR decomposition and the qr() function, see help(qr).

Related to this, note that in your example there is an easy fix:

set.seed(1) x <- matrix(rnorm(20), ncol = 5) # more columns than rows x1 <- x[,1:3] x2 <- x[,4:5] dropvar(x1) #no variables are dropped dropvar(x2) #no variables are dropped

If you try to do something similar with you mXis, by contrast, then hundreds of variables are dropped in each group even though both of them have (substantially) less columns than rows. So I still believe there is a problem with your design of mXis.

Note that, in the uis argument of isat(), you can provide it with a list of matrices rather than a single matrix. Maybe the simplest solution is to change isat() so that it split matrices into smaller matrices to ensure the number of rows is larger than the number of columns.