jkcshea / ivmte

An R package for implementing the method in Mogstad, Santos, and Torgovitsky (2018, Econometrica).
GNU General Public License v3.0
18 stars 2 forks source link

Collinearity in error matrix #29

Closed jkcshea closed 5 years ago

jkcshea commented 5 years ago

Below is an MWE of collinearity in the error matrix when carrying out the GMM point estimate. Switch over to the branch enhance/gmm to run the example below. Note that the code is being debugged, so an excessive amount of output is being generated. In addition if you look into the code files themselves, you will find many comments/commented out test code.

Regarding the data used to generate this example, the MTRs depend only on u, and there is a single binary instrument Z.

rm(list = ls())
set.seed(10L)
devtools::load_all(pkg = "IVMTE", reset = TRUE)

dts1 <- gendata1()

ivlike <- c(ey ~ 1 + d,
            ey ~ 1 + d | factor(z),
            ey ~ 0 + d )

components <- lists.mst(d, d, d)

result <- ivmte(ivlike = ivlike,
                data = dts1,
                components = components,
                propensity = p,
                m0 = ~ 0 + u ,
                m1 = ~ 1 + u,
                uname = u,
                target = "ate",
                point = TRUE,
                treat = d,
                point.itermax = 2)

In the code above, the model is correctly specified. GMM is able to exactly recover the parameters of the MTRs when the GMM weight matrix is the identity matrix. However, the residuals used to construct the optimal weighting matrix are collinear, and so we run into non-invertability problems in the second step of the two-step GMM procedure.

I can't find any IV-like specifications to get this to work. Writing out what each residual term is seems to suggest that this simple model is simply prone to this very issue. I will write up a document on this to clarify.

Switching Z from being binary to being ternary resolves the issue. So run the above code, but using the data set

dts2 <- gendata2()

to see this work. Again, the coefficients of the MTR are successfully recovered. I will also try to clarify how this resolves the collinearity issue in the write-up.

a-torgovitsky commented 5 years ago

I'm a bit confused as to the scope of the problems.

Last week, I thought we discussed that you were finding problems with the first step (equally weighted GMM using the identity matrix). Is this not the case anymore? The equally-weighted setting should just be a matter of counting how many (non-redundant) IV--like estimands we have and making sure that is larger than the number of parameters. Certainly having an irrelevant instrument is going to make some or all IV--like estimands redundant, but this is to be expected.

Perhaps the Z's are causing some confusion here. The first step GMM estimate should really just be a system regression, so should look like: where G{i} is S x K and Y{i} is S x 1. Can you confirm that the above equation is giving your ?

I want to make sure that we've completely and totally solved all issues with the unweighted estimator before going on to consider optimal weighting.

jkcshea commented 5 years ago

Ah, I'm sorry I made things confusing.

We talked about how there were two issues: (i) collinearity in the errors \varepsilon_i in the equally-weighted setting; (ii) asymptotic standard errors in the equally-weighted setting.

This issue looks at (i), but it may have been smarter to start with (ii). I'll open an issue for (ii).

Nevertheless, I am not carrying out the system regression defined above. Rather, the way in which the Z matrices enter into the GMM estimator implies is estimated by

This estimator is somewhat strange, as it is akin to performing a regression with only the S observations (regardless of how large N is), but that is what we get using when we set to be the identity.

Perhaps I should have never used the matrices, which are constructed by stacking N identity matrices of dimension SxS.

The two estimation methods behave similarly, but I can change it to the one you described if that is what you had in mind this whole time.

a-torgovitsky commented 5 years ago

The one I described is what I had in mind.

Conceptually what we want to do here is make E[Y - G\theta] as small as possible. It's an S-dimensional vector, so small can mean a lot of things (i.e. there are many norms and each norm will yield a slightly different solution to the minimization problem). GMM (or maybe this is more appropriately categorized as minimum distance? Basically the same thing) is saying: 1) First let's minimize using the standard 2-norm (this is the identity weighting) 2) Based on the result to 1), let's find the best norm to use in terms of variance (this is the optimal weighting).

In step 1), we are just looking at minimizing the sample analog of E[Y - G\theta]'E[Y - G\theta] which should be exactly least squares in this problem, but with a system instead of a single equation.

jkcshea commented 5 years ago

Ah you are right. Implementing the system OLS resolves the error in the MWE above. The estimate of the asymptotic variance also matches up with the monte carlo simulations.

Should I have a crack at the second step? And if I do, is what you have in mind FGLS? If so, I'm not quite sure how to deal with the term

which must be inverted. The reason is that , so it is necessarily rank 1 and not invertible... maybe I am doing something wrong again?

However, issue #30 remains despite this correction, still investigating what is going on there.

jkcshea commented 5 years ago

Ah implementing the system regression does indeed resolve this MWE. The estimates of the asymptotic variance also matches up with the MC simulation. They only match once allowing for heteroskedasticity.

Issue #30 persists, though, so I'm looking into that.

Looking ahead, though, the second step would be FGLS?

a-torgovitsky commented 5 years ago

You want to be taking the sample average of the residuals, e.g. That should be invertible. I'll send you a PDF with formal discussion via email.

jkcshea commented 5 years ago

I was able to track down the mistake that was resulting in negative variance-covriance matrices. The mistake was that I was updating the weight matrices one more time than necessary, i.e. the steps I took were:

  1. Carry out OLS
  2. Obtain error terms
  3. Obtain weighting matrix using error terms from step 2
  4. Perform FGLS
  5. Obtain new error terms
  6. Obtain new weighting matrix using error terms from step 5

Step 6 was the mistake. Once removing step 6, the MC simulations match the asymptotic variance estimates.

This also ended the collinearity/inversion issues that would pop up in the MC simulations. I apologize for this whole mess.

EDIT: But on second thought, given what we discussed, I'm unclear of how the mistake of step 6 made it possible to get negative variances. I can look more into this, in case that problem is due to an error elsewhere.

If we pass the distribution into the function, and the model is specified correctly, the non-invertability of seems to occur when we are overidentified. I suspect it is to do with the fact that we would perfectly recover the coefficients of the MTRs, similar to the last writeup. I suppose this is unlikely in practice as it requires the researcher to specify the model perfectly and observe the population. But perhaps it could happen if the specified MTRs and their coefficient estimates are sufficiently close to the true MTR.

Not sure why I wasn't able to come up with the simpler working example below, but here it is:

EDIT: previous example actually used IV specifications that had colinear weights, so of course the error matrix was not invertible. This second example fixes that mistake.

rm(list = ls())
set.seed(10L)
devtools::load_all(pkg = "IVMTE", reset = TRUE)

dts3 <- gendist3()
## True MTR: m0 = 3, m1 = 9
## Instrument Z has support {1, 2}

ivlike <- c(ey ~ 1 + d | z,
            ey ~ 1 + d | factor(z))

components <- lists.mst(d, intercept)
## If we also include the component 'd' from the second specification, we run 
## into collinearity issues, even though the weights are not collinear. Component 
## 'd' can be included if the support of Z is expanded to {1, 2, 3}.

result <- ivmte(ivlike = ivlike,
                data = dts3,
                components = components,
                propensity = p,
                m0 = ~ 1,
                m1 = ~ 1,
                uname = u,
                target = "ate",
                point = TRUE,
                treat = d,
                point.itermax = 2)

## Set point.itermax to 1 if you don't want to do FGLS, and only want system OLS
a-torgovitsky commented 5 years ago

Ok great, this is good progress.

However, you are right in your first edit that this doesn't explain what the root of the problem is. We should be able to iterate the optimal weight estimation (steps 2-5) as many times as we want and never get an estimated variance-covariance matrix that is not positive definite. So it's important to get to the bottom of this because it probably indicates a bug elsewhere in the code that could crop up later.

Regarding the second point: Are you using exactly the population distribution to estimate the model? If you do that and the model is correctly specified, then yes it's true you the variances of the errors will be zero, and so their variance-covariance matrix will not be invertible.

But that's not really the right exercise. The right exercise is to draw N observations from the population distribution, then estimate with those N observations. If you do that, then you will always have some error due to the fact that N < \infty. The errors will have positive variance, and so their var-cov matrix will be invertible.

jkcshea commented 5 years ago

Okay, I will focus on the first part regarding the positive definiteness of the variance covariance matrix.

And that second exercise is as you described (i.e. I was doing the wrong exercise, sorry). So I'll put that aside.

a-torgovitsky commented 5 years ago

Perfect, thanks!

jkcshea commented 5 years ago

The cause of the non-positive definiteness problem is bizarre.

It has to do with the meat of the sandwich formula for FGLS,

(As a reminder, I originally saw this problem when I had updated the Omega matrix more than necessary. Since removing step 6 from the previous post, I have yet to see this problem. The code below uses the over-updated Omega, to recreate the problem.)

For a single observation i, the outer product below:

t(gmmi) %*% solve(emat) %*% evec %*% t(evec) %*% solve(emat) %*% gmmi

where solve(emat) is a symmetric matrix, comes out as non-positive definite. To be safe, I also tried transposing the solve(emat), but the matrix still comes out as non-positive definite.

But if I do the following:

bmat <- t(gmmi) %*% solve(emat) %*% evec        
bmat %*% t(bmat)

I get out a positive definite matrix. Still haven't figured out why this happens, but this seems to be the cause. Nevertheless, it seems to suggest I should use the second approach to estimating the standard errors.

jkcshea commented 5 years ago

Non-positive definiteness

Comparing the matrices generated by the two lines of code above, it seems to be a precision issue. The matrices are close, but each entry differs by a tiny fraction. Nevertheless, this difference seems to be enough to mess things up.

Collinearity

I have an example where the exclusion of step 6 results in the everything behaving as they should, and the inclusion of step 6 resulting in invertability errors. So it seems as though over-updating the Omega matrix is also responsible for the collinearity issues when estimating the asymptotic standard errors.

The standard error estimates for the ATE under point identification now seem to be as they should, and match up with the MC simulations.

a-torgovitsky commented 5 years ago

Ok, this is good -- but let's take this to the end and completely pin it down.

This contrast is the smoking gun:

t(gmmi) %*% solve(emat) %*% evec %*% t(evec) %*% solve(emat) %*% gmmi

vs

bmat <- t(gmmi) %*% solve(emat) %*% evec     
bmat %*% t(bmat)

Obviously there should not be different results here! The computation is just being broken up into two parts.

My guess is that it has something to do with using solve twice.

What happens if you compare

solve(emat) %*% solve(emat)

to just

bmat <- solve(emat)
bmat %*% t(bmat)

? If you get something different here, then I think you should do this: Output emat to its own dataset. Create a small script file that loads in emat and tries to do the above contrast. If you still have numerically important differences between the two approaches, then I think we need to get some input from Stack Exchange. At that point it should be a simple enough problem that someone there could answer it quickly or confirm that it is a low-level bug in R.

jkcshea commented 5 years ago

The comparison you suggested results in quite noticeable differences, actually.

> emat
        V1        V2        V3
1 170.2939  15.77391 110.75499
2  15.7739 444.57862   8.87082
3 110.7550   8.87082  72.03669
> solve(emat) %*% solve(emat) - bmat %*% t(bmat)
       1         2 3
V1 -1024  48.00000 0
V2     8  -0.21875 0
V3  2048 -72.00000 0

So I've posted a question on Stack Overflow.

https://stackoverflow.com/questions/52960615/r-inconsistent-product-of-matrix-inversions

This is very strange...

a-torgovitsky commented 5 years ago

Right now looks like people are suggesting its a numerical precision issue. Are you normalizing the emat matrix by (1/N)? This might help keep things a bit more stable?

jkcshea commented 5 years ago

Yep, emat is being normalized. And it indeed looks like a precision issue.

Previously, I had only saved and output the emat file to carry out that little test. I saved the file with the maximum precision possible, and it indeed is symmetric.

Now, I also tried writing out the inverse of emat, and it is not symmetric. Attached below should be the matrix.

ematinv.txt

None of the off-diagonal elements match with their corresponding pairs. So these tiny differences may be enough to cause the problems observed. And now knowing that, it turns out we do have solve(emat) %*% t(solve(emat)) == bmat %*% t(bmat).

I was wondering if it is possible that with the bmat approach, R is saving a less precise version of the matrix, essentially rounding off some decimal places, and resulting in a more symmetric matrix. Whereas the double solve(emat) method uses all the computer precision possible. But that won't explain why transposing the second solve(emat), i.e.

t(gmmi) %*% solve(emat) %*% evec %*% t(evec) %*% t(solve(emat)) %*% gmmi

still results in the non-positive definite variance matrix.

a-torgovitsky commented 5 years ago

Ok, I think what you have is sufficiently weird and concrete that we can wait a bit more to see what the Stack Exchange folk think.

Stepping back though, the issue for our purposes is whether R treats these matrices we know should be invertible as actually invertible. Given that this seems to be a problem of precision at very many decimal places, have you tried playing with the tol option to solve? From the manual I am not sure exactly what it does, but I think it controls when a matrix is declared singular vs nonsingular?

a-torgovitsky commented 5 years ago

Also another thought --- maybe it would help to look at the gmm package code and see if they do something special to handle numerical issues?

jkcshea commented 5 years ago

From the manual I am not sure exactly what it does, but I think it controls when a matrix is declared singular vs nonsingular?

From a quick test of the tolerance, this is what it looks like. It looks like the threshold for determining whether or not a matrix is invertible is its (reciprocal) condition number, which "measures how much the output value of the function can change for a small change in the input argument."

https://en.wikipedia.org/wiki/Condition_number

Suppose we want to solve for x in the system Ax = b, where b is the input and x is the output. If the condition number is large (or its reciprocal is small), that implies a small change in b can result in a huge change of x, and result in very imprecise estimates. If the conditioning number is sufficiently large (or the reciprocal is sufficiently small), then A is deemed non-invertible. As you said, setting the tolerance determines the threshold for the reciprocal condition number such that a matrix is deemed non-invertible.

A simple test on a matrix I constructed (so not from the previous simulations, which only included perfectly collinear bread terms from the sandwich estimator) suggests adjusting the tolerance can make a previously non-invertibile matrix invertible.

However, a possible solution is to rescale the matrices before inverting, e.g. solve(emat * 1000), and then undoing the scaling later. This eliminates the problem perhaps by reducing the degree of precision required. So if we want to handle these cases with this method, we may have to figure out a rule for how much to scale a given matrix by.

maybe it would help to look at the gmm package code and see if they do something special to handle numerical issues?

Sure, I can look into the gmm package to see how/if it handles numerical issues.

a-torgovitsky commented 5 years ago

Ok, that's what I thought and why I had asked earlier whether you were scaling by (1/N).

But from what you wrote it seems that scaling by (1/N) would exacerbate numerical issues? Or am I reading it wrong? If so, then maybe we want to not multiply by (1/N) then multiply by N after inverting.

Presumably the gmm package (or even something like lm) is going to have a well-thought out solution for this.

jkcshea commented 5 years ago

Ah sorry, didn't make the connection. But your reading is correct: scaling by (1/N) is not good.

But from a computational standpoint, what I said earlier should be wrong, i.e. it should not be the case that scaling things up reduces precision.

Moreover, the scaling up of matrices is also inconsistent.

So if emat is generated without being normalized, we do not have symmetry (as before).

[1] "symmetry test 1"
[1] "solve(emat) == t(solve(emat))"
      [,1]  [,2]  [,3]
[1,]  TRUE FALSE FALSE
[2,] FALSE  TRUE FALSE
[3,] FALSE FALSE  TRUE

Now if we rescale inside of the solve command, and then undo the scaling after, we do get symmetry:

[1] "symmetry test 2"
[1] "solve(emat * 1000) * 1000 == t(solve(emat* 1000) * 1000)"
     [,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE TRUE
[3,] TRUE TRUE TRUE

However, if emat was generated without ever being scaled by (1/N), or we rescale emat outside of the solve command, then we do not have symmetry again.

[1] "symmetry test 3"
[1] "emat2 <- emat * 1000"
[1] "solve(emat2) == t(solve(emat2))"
      [,1]  [,2] [,3]
[1,]  TRUE FALSE TRUE
[2,] FALSE  TRUE TRUE
[3,]  TRUE  TRUE TRUE

This whole thing is very confusing.

I'll see if the gmm package can provide any insight on this.

a-torgovitsky commented 5 years ago

This seems insane. Why would it matter whether you scale up inside or outside the solve command?

x <- y*1000
solve(x)

is different than

solve(y*1000)

?????

That seems like it must be some sort of bug.

jkcshea commented 5 years ago

A quick check on Python seems to further suggest this precision issue is not unique to R, actually.

>>> emat == emat.transpose()
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

>>> numpy.linalg.inv(emat) == numpy.linalg.inv(emat).transpose()
array([[ True, False,  True],
       [False,  True, False],
       [ True, False,  True]], dtype=bool)

But Python doesn't have that ridiculous scaling issue where I multiplying by 1000 at different stages results in different matrices.

a-torgovitsky commented 5 years ago

This stack exchange post (for matlab) suggests an acceptable solution would just be to do (emat + emat.transpose())/2 Seems like in our context this should be a fine solution.

jkcshea commented 5 years ago

This does resolve the symmetry issue. But since the emat matrix enters back into the sandwich formula, the issue of its precision persists. Sometimes, the bread component is non-positive definite, sometimes it's the meat component.

But I think I figured out what the problem is.

From this R manual page, I had the impression that the precision of R could go as deep as 22 decimal places.

But from this stack overflow discussion on precision, and this wikipedia article on double precision, it seems like double precision arithmetic can only handle about 16 to 17 digits of precision.

So as a reminder, these bizarre numerical issues only happen when I update emat one more time than necessary. It is the unnecessary update that was causing problems. Once I rounded the vector of errors to 8 dp (so their outer products will have at most 16 dp), then these strange problems disappeared.

It is still unclear to me why this problem did not occur when I first construct the emat matrix. The errors used for that also have many decimal places. Nevertheless, I put in a line of code to round the residuals to 8 dp when constructing emat, and things seem to be working normally.

a-torgovitsky commented 5 years ago

Ok that seems like a good resolution then and mystery mostly solved. Issue closed then?