jkcshea / ivmte

An R package for implementing the method in Mogstad, Santos, and Torgovitsky (2018, Econometrica).
GNU General Public License v3.0
18 stars 2 forks source link

Instability and spline intercept term #214

Closed a-torgovitsky closed 2 years ago

a-torgovitsky commented 2 years ago

Hi @jkcshea do you understand what's going on in this example? I'm not so sure I understand what adding 0 + does in the context of splines. (I know it removes the intercept, but not clear on what the significance of that is.) @johnnybonney and @cblandhol --- have you run into anything like this?

library("ivmte")

set.seed(1234)
args <- list(
    data = AE,
    outcome = "worked",
    target = "ate",
    propensity = morekids ~ samesex + yob + black + hisp + other,
    solver = "gurobi",
    audit.nu = 100,
    audit.nx = 10000
)

message("The following is is very unstable")
args[["m0"]] <-
    ~ uSplines(degree = 3, knots = seq(from = .25, to = .75, by = .25)) +
    yob + black + hisp + other
args[["m1"]] <- args[["m0"]]

for (i in 1:4) {
    print(i)
    print(do.call(ivmte, args))
}

message("But without the intercept (0+) it becomes quite stable?")
args[["m0"]] <-
    ~ 0 + uSplines(degree = 3, knots = seq(from = .25, to = .75, by = .25)) +
    yob + black + hisp + other
args[["m1"]] <- args[["m0"]]

for (i in 1:4) {
    print(i)
    print(do.call(ivmte, args))
}

Produces:

The following is is very unstable
[1] 1

Bounds on the target parameter: [-0.2939899, 0.1626301]
Audit terminated successfully after 3 rounds 

[1] 2

Bounds on the target parameter: [-0.310311, 0.1820539]
Audit terminated successfully after 9 rounds 

[1] 3

Bounds on the target parameter: [-0.2970819, 0.1598281]
Audit terminated successfully after 4 rounds 

[1] 4

Bounds on the target parameter: [-0.9755007, 0.8280291]
Audit terminated successfully after 5 rounds 

But without the intercept (0+) it becomes quite stable?
[1] 1

Bounds on the target parameter: [-0.29517, 0.1641619]
Audit terminated successfully after 2 rounds 

[1] 2

Bounds on the target parameter: [-0.2951708, 0.1641653]
Audit terminated successfully after 3 rounds 

[1] 3

Bounds on the target parameter: [-0.2951325, 0.1641138]
Audit terminated successfully after 2 rounds 

[1] 4

Bounds on the target parameter: [-0.2951023, 0.1640747]
Audit terminated successfully after 3 rounds 
johnnybonney commented 2 years ago

I have not noticed this before, but I also haven't played around with excluding the intercept term...

jkcshea commented 2 years ago

Hm, the intercept and splines are collinear. So removing the intercept should not have any effect on these bounds. And on my machine, it doesn't---my estimates are all very stable with and without the intercept.

The following is is very unstable

[1] 1

Bounds on the target parameter: [-0.2952303, 0.1642794]
Audit terminated successfully after 2 rounds 

[1] 2

Bounds on the target parameter: [-0.2952691, 0.1642836]
Audit terminated successfully after 3 rounds 

[1] 3

Bounds on the target parameter: [-0.2951101, 0.1640856]
Audit terminated successfully after 3 rounds 

[1] 4

Bounds on the target parameter: [-0.2952837, 0.1643171]
Audit terminated successfully after 3 rounds 

But without the intercept (0+) it becomes quite stable?

[1] 1

Bounds on the target parameter: [-0.29517, 0.1641627]
Audit terminated successfully after 2 rounds 

[1] 2

Bounds on the target parameter: [-0.29517, 0.1641618]
Audit terminated successfully after 3 rounds 

[1] 3

Bounds on the target parameter: [-0.2951318, 0.1641147]
Audit terminated successfully after 2 rounds 

[1] 4

Bounds on the target parameter: [-0.2951022, 0.1640747]
Audit terminated successfully after 3 rounds 

I'm not sure what is happening. @johnnybonney @cblandhol Can you please see if you get unstable or stable estimates with the intercept?

johnnybonney commented 2 years ago

I get unstable estimates. Running the example posted by @a-torgovitsky, I get the following output:

The following is is very unstable

[1] 1

Bounds on the target parameter: [-0.3001836, 0.1694621]
Audit terminated successfully after 6 rounds 

[1] 2

Bounds on the target parameter: [-1, 0.9083253]
Audit terminated successfully after 5 rounds 

[1] 3

Bounds on the target parameter: [-0.9483571, 0.8017114]
Audit terminated successfully after 10 rounds 

[1] 4

Bounds on the target parameter: [-0.9610267, 0.8139815]
Audit terminated successfully after 6 rounds 

But without the intercept (0+) it becomes quite stable?

[1] 1

Bounds on the target parameter: [-0.2951834, 0.1641617]
Audit terminated successfully after 2 rounds 

[1] 2

Bounds on the target parameter: [-0.2951699, 0.1641618]
Audit terminated successfully after 3 rounds 

[1] 3

Bounds on the target parameter: [-0.2951323, 0.1641142]
Audit terminated successfully after 2 rounds 

[1] 4

Bounds on the target parameter: [-0.2951022, 0.1640755]
Audit terminated successfully after 3 rounds 

In case its relevant, some info on what I'm running (I haven't updated R in a while, though would be surprised if that explains anything...)

R version 4.1.0 (2021-05-18) -- "Camp Pontanezen"
Platform: x86_64-apple-darwin17.0 (64-bit)

I updated to the most recent version of ivmte (master branch) immediately before running the code.

a-torgovitsky commented 2 years ago
R version 4.0.5 (2021-03-31) -- "Shake and Throw"
Platform: x86_64-pc-linux-gnu (64-bit)
jkcshea commented 2 years ago

Hm, interesting. Could it be the R version, or maybe the Gurobi version?

So here's what I have on my machine.

R version 4.2.0 (2022-04-22) -- "Vigorous Calisthenics"
Platform: x86_64-pc-linux-gnu (64-bit)
Gurobi Optimizer version 9.1.2 build v9.1.2rc0 (linux64)

Here is the university server, which also gets unstable estimates.

R version 4.1.0 (2021-05-18) -- "Camp Pontanezen"
Platform: x86_64-pc-linux-gnu (64-bit)
Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (linux64)

An easy way to check the Gurobi version is to add the following options to ivmte().

noisy = TRUE
debug = TRUE

That will return the output from Gurobi, including the details on the version.

johnnybonney commented 2 years ago

I have the same Gurobi version as your machine:

Gurobi Optimizer version 9.1.2 build v9.1.2rc0 (mac64)

Edit: I am happy to update my R version to 4.2.0 to see if that fixes things. But it may also be useful for me to keep testing on 4.1.0 if it's in fact true that minor changes to R versions are leading to big stability changes...

jkcshea commented 2 years ago

Thanks @johnnybonney I think you can stay on 4.1.0 for now. Fortunately, I also have access to 4.1.0 through the server. So what I'll do is check if the models are being constructed identically on my version of R and the server's version of R. That may give us a lead on what is happening...

jkcshea commented 2 years ago

@johnnybonney's hunch look to be correct: I think it's the the R versions.

I've been running the example above on the server and my machine. I've restricted the number of audits to just 1 by setting the initial grid and audit grid to be identical. The only differences between my machine's QCQP problem (R 4.2) and the server's QCQP problem (R 4.1) are the linear equality constraints defined using the Cholesky decomposition.

Here's the part of the code where the main differences between my computer (R 4.2) and the server (R 4.1) arise.

    AA <- t(drX) %*% drX
    cholAA <- suppressWarnings(chol(AA, pivot = TRUE))
    ...
    [code on pivoting]

The object drX is the $B$ matrix used in the direct regression. We apply the Cholesky decomposition to $B'B$.

The chol() function in R 4.1 seems sensitive to slight changes in $B'B$. I tested this by using the fact that $B$ is estimated slightly differently on my machine compared to the server. Let $B{\mathrm{m}}$ denote the $B$ estimated on my machine and $B{\mathrm{s}}$ denote the $B$ estimated on the server. The max entry-wise difference between $B{\mathrm{m}}$ and $B{\mathrm{s}}$ is small: 3.597678e-13. This arises from slight differences in the propensity score estimates, which are no larger than 2.516876e-13. Let $\texttt{chol}(B'B)$ denote the triangular matrix obtained by applying the Cholesky decomposition to $B'B$.

So I'd be curious to see what happens if you update your version of R, @johnnybonney. If chol() in R 4.1 really is the problem, then you should get stable estimates after the update.

a-torgovitsky commented 2 years ago

Interesting! There should be a way to check the commit history for changes to chol right?

I guess it's also not a fully satisfying answer to me because it's just saying that the new version of chol is robust to having the extra collinear intercept term, right? It's not explaining why that extra term doesn't get properly removed as a redundant regressor.

jkcshea commented 2 years ago

I guess it's also not a fully satisfying answer to me because it's just saying that the new version of chol is robust to having the extra collinear intercept term, right? It's not explaining why that extra term doesn't get properly removed as a redundant regressor.

Ah you're right. I was too caught up trying to understand why the example with the intercept was stable on my machine but not stable on the server. I'll look at what's happening when including and excluding the intercept.

There should be a way to check the commit history for changes to chol right?

Good suggestion. Here is the R source code for the chol(). It looks like chol() was not updated... Could it be we have different versions of LAPACK? I will have to look into this more, it seems.

johnnybonney commented 2 years ago

I updated R to 4.2.0 but my results are still unstable with the intercept. (I get the exact same output as before.)

jkcshea commented 2 years ago

Ah how unfortunate. Thanks, @johnnybonney, I'll keep digging.

jkcshea commented 2 years ago

I guess it's also not a fully satisfying answer... It's not explaining why that extra term doesn't get properly removed as a redundant regressor.

(emphasis above added by me) Ah, so do we want to manually remove collinear regressors?

At the moment, if the design matrix $B$ is collinear, ivmte will continue to work with $B$. So the Cholesky decomposition gets applied to the positive semidefinite matrix $B'B$. My understanding is that, in theory, this should not affect the bounds. But in practice, it seems to have an effect.

So one approach is to manually remove the collinear regressors. But wouldn't that result in point identification every time?

a-torgovitsky commented 2 years ago

Hmm yes you're right, we certainly don't want to require $B'B$ to be invertible, since that would be tantamount to imposing point identification every time. Having collinear terms shouldn't matter for the entire function $m$ (as opposed to the individual parameters $\theta$ that make it up), so yes it shouldn't be necessary to remove anything.

Ok, so it is indeed just a matter of chol seeming to behave oddly. It's particularly odd given that chol in R is just calling out to LAPACK which I would imagine hasn't been changed in decades.

What would be the right way to debug this... I guess the first thing would be to make sure it's completely reproducible, which we haven't done---all three of of our estimates are different in all of the cases. I thought setting the seed would be enough to ensure this...?

jkcshea commented 2 years ago

Could I first check how differently our machines are executing chol()? Below is a script performing chol() on $B{\mathrm{m}}' B{\mathrm{m}}$ and $B{\mathrm{s}}' B{\mathrm{s}}$. The script saves the decompositions as cholAA1.rds and cholAA2.rds. Could you please run the script and upload those decompositions?

This script tells me that I was wrong when I said this:

The chol() function in R 4.1 [on the server] seems sensitive to slight changes in $B'B$.

It turns out chol() in R 4.1 on the server is not that sensitive to changes in $B'B$. Nevertheless, the decompositions I'm getting from the server are indeed different from those of my machine, even though I'm decomposing the exact same matrices.

EDIT: Below is the script. chol-test.zip

jkcshea commented 2 years ago

Ah, I forgot to respond to this:

I guess the first thing would be to make sure it's completely reproducible, which we haven't done... I thought setting the seed would be enough to ensure this...?

The randomness across estimates on each of our machines is because of the covariates in the initial grid. Setting the seed should result in us using the same initial grid. But since we're getting different estimates---seemingly because of chol()---the initial grid may be expanded differently across our machines. So that may contribute to differences in our estimates.

One way to eliminate this randomness is to set initgrid.nx = 2500. The initial grid will now use the full support of covariates and will no longer be random. This eliminates any randomness from the estimates, e.g.

> for (i in 1:4) {
+     print(i)
+     result <- do.call(ivmte, args)
+     saveRDS(result, "result.rds")
+     print(result)
+ }

[1] 1

Bounds on the target parameter: [-0.2958118, 0.1643239]
Audit terminated successfully after 2 rounds 

[1] 2

Bounds on the target parameter: [-0.2958118, 0.1643239]
Audit terminated successfully after 2 rounds 

[1] 3

Bounds on the target parameter: [-0.2958118, 0.1643239]
Audit terminated successfully after 2 rounds 

[1] 4

Bounds on the target parameter: [-0.2958118, 0.1643239]
Audit terminated successfully after 2 rounds 

Any differences in estimates across our machines after setting initgrid.nx = 2500 should only be because of computation.

a-torgovitsky commented 2 years ago

Here are cholAA1.rds and cholAA2.rds from running it on my system. chol-output.zip

Got it on the randomness---that makes sense.

johnnybonney commented 2 years ago

Here is my output: chol-output-john.zip

jkcshea commented 2 years ago

Thanks everyone. That script performs Cholesky decompositions of $B{\mathrm{m}}'B{\mathrm{m}}$ and $B{\mathrm{s}}'B{\mathrm{s}}$ and saves the decompositions. It looks like I'm the odd one out. The output from @a-torgovitsky, @johnnybonney, and Acropolis are identical.

Here is a summary of $\vert (B{\mathrm{m}}'B{\mathrm{m}} - B{\mathrm{s}}'B{\mathrm{s}})_{i, j} \vert$, where $i$, $j$ index the rows and columns of the matrix.

i.e, I'm summarizing the absolute values of the elements of $B{\mathrm{m}}'B{\mathrm{m}} - B{\mathrm{s}}'B{\mathrm{s}}$.

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.000e+00 0.000e+00 0.000e+00 6.383e-09 7.660e-11 3.597e-07 

Now here is a summary of the difference in their Cholesky decompositions across our machines, i.e., summary of $\vert (\texttt{chol} (B{\mathrm{m}}'B{\mathrm{m}}) - \texttt{chol}(B{\mathrm{s}}'B{\mathrm{s}})_{i,j}\vert$ for each of our machines.

The decomposition from my machine seems least sensitive to differences between $B{\mathrm{m}}$ and $B{\mathrm{s}}$. However, my decompositions are the odd one out, so maybe mine are problematic.

   Min. 1st Qu. Median         Mean 3rd Qu.         Max.    Source
1:    0       0      0 6.704660e-11       0 3.235382e-08        JS
2:    0       0      0 7.073533e-09       0 2.571042e-06 Acropolis
3:    0       0      0 7.073533e-09       0 2.571042e-06        AT
4:    0       0      0 7.073533e-09       0 2.571042e-06        JB

Here are the differences in $\texttt{chol}(B{\mathrm{m}}'B{\mathrm{m}})$ across our machines:

   Min. 1st Qu. Median         Mean 3rd Qu.        Max.           Source
1:    0       0      0 2.932389e-05       0 0.008443555 JS vs. Acropolis
2:    0       0      0 2.932389e-05       0 0.008443555        JS vs. AT
3:    0       0      0 2.932389e-05       0 0.008443555        JS vs. JB

And here are the differences in $\texttt{chol}(B{\mathrm{s}}'B{\mathrm{s}})$ across our machines:

   Min. 1st Qu. Median         Mean 3rd Qu.        Max.           Source
1:    0       0      0 2.932935e-05       0 0.008444025 JS vs. Acropolis
2:    0       0      0 2.932935e-05       0 0.008444025        JS vs. AT
3:    0       0      0 2.932935e-05       0 0.008444025        JS vs. JB

Again, the output from @a-torgovitsky, @johnnybonney, and Acropolis are identical. So I will need to figure out what is going on with my version of R/Ubuntu/LAPACK/whatever else it may be...

a-torgovitsky commented 2 years ago

Is the behavior affected by any of the options of chol? Maybe the default tolerance getting passed is different? (Seems unlikely, but perhaps...)

jkcshea commented 2 years ago

Hm, the options do not seem to be the cause (at the moment).

So here are the options (reference here):

By default, tol = -1, which tells LAPACK to use its default tolerance. This is equal to nrow(x) * .Machine$double.neg.eps * max(diag(x)). This tolerance is the same on my machine and on the server. Seeing that your Cholesky decompositions are identical to those of the server, I think your default tolerance is also the same. (In case you want to check, my default tolerance is 8.694239e-07.)

EDIT: Or rather, the value of .Machine$double.neg.eps is 1.110223e-16 for me. The x matrix is the same for all of us.

a-torgovitsky commented 2 years ago

Yeah I have the same machine epsilon too...long-shot unless one of us was using a computer from the 90s ;)

Have you found a way to determine what version of LAPACK you have? I would guess start by searching the Ubuntu repositories for the phrase.

You could also try the decomposition in another language that uses LAPACK (my guess is SciPy would be doing that too? and Julia) to pin down whether it's the interfacing language or LAPACK itself.

jkcshea commented 2 years ago

Have you found a way to determine what version of LAPACK you have?

That seems to be it. It is easy to check using the R command sessionInfo(). Here are the BLAS/LAPACK libraries I was originally using:

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

Maybe the latest versions of Ubunutu use ATLAS. So I switched to (what I think are) the default BLAS and LAPACK libraries.

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

After doing this, our decompositions from a few posts up perfectly match. I can also perfectly replicate the sequence of estimates from running @a-torgovitsky's original example on Acropolis. But these estimates still differ from those of @a-torgovitsky and @johnnybonney reported above. So there is still something I'm not picking up...

a-torgovitsky commented 2 years ago

For me:

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

Ok, so that partially resolves some mystery I suppose? From what I understand of ATLAS, it seems like an "improvement." And @jkcshea 's results were better (stable with intercept) compared to the ones I had and @johnnybonney had. So I guess the conclusion would be that the Cholesky decomposition with ATLAS is more stable in a way that more gracefully handles the redundancy caused by adding an intercept to the spline specification. Does that reasoning make sense?

If so, I'm not sure what we can do to improve this situation. But it's worrying that a user without ATLAS might get something dramatically different without excluding the intercept...


I didn't follow this:

I can also perfectly replicate the sequence of estimates from running @a-torgovitsky's original example on Acropolis. But these estimates still differ from those of @a-torgovitsky and @johnnybonney reported above.

Sounds contradictory?

jkcshea commented 2 years ago

But it's worrying that a user without ATLAS might get something dramatically different without excluding the intercept...

Something I can try is detect which BLAS/LAPACK library is being used. If it isn't ATLAS, maybe throw a warning? But I'm concerned we'd be throwing warnings all the time.


I didn't follow this: I can also perfectly replicate the sequence of estimates from running @a-torgovitsky's original example on Acropolis. But these estimates still differ from those of @a-torgovitsky and @johnnybonney reported above.

So even though I can replicate your (@a-torgovitsky) Cholesky decompositions, I cannot replicate your estimated bounds. For instance, my machine and Acropolis now returns the following unstable estimates when including the intercept:

[1] 1

Bounds on the target parameter: [-0.2943683, 0.1625186]
Audit terminated successfully after 2 rounds 

[1] 2

Bounds on the target parameter: [-0.334092, 0.2076029]
Audit terminated successfully after 3 rounds 

[1] 3

Bounds on the target parameter: [-0.288325, 0.1618743]
Audit terminated successfully after 2 rounds 

[1] 4

Bounds on the target parameter: [-0.3665427, 0.2499749]
Audit terminated successfully after 3 rounds 

But those are different from your estimates here. @johnnybonney's estimates (here) are also different.

a-torgovitsky commented 2 years ago

Something I can try is detect which BLAS/LAPACK library is being used. If it isn't ATLAS, maybe throw a warning? But I'm concerned we'd be throwing warnings all the time.

Yeah I definitely don't think we want to do this.

But those are different from your estimates https://github.com/jkcshea/ivmte/issues/214#issue-1262477102. @johnnybonney's estimates (https://github.com/jkcshea/ivmte/issues/214#issuecomment-1149202072) are also different.

Is this maybe just due to differences in random number generation across versions/platforms?

> set.seed(12345)
> runif(5)
[1] 0.7209039 0.8757732 0.7609823 0.8861246 0.4564810

Getting back to the main issue. Two questions:

  1. Is there ever a good reason to put uSplines instead of 0 + uSplines... ? What would be the use case?
  2. With just uSplines are there two rows of the Gram matrix that are identical? As we discussed above, we expect the Gram matrix to have linear dependencies due to partial identification. But having two rows that are identical is a more extreme version of that.
jkcshea commented 2 years ago

Is this maybe just due to differences in random number generation across versions/platforms?

It does not seem to be, since I generate the same sequence of random numbers.


  1. Is there ever a good reason to put uSplines instead of 0 + uSplines... ? What would be the use case?

I can't think of a time when we want uSplines instead of 0 + uSplines. But I realize there is a simple way to resolve this issue.

Our uSplines function calls on the bSplines function from the splines2 package. The bSplines function has the argument intercept, which I set to TRUE by default. When intercept = TRUE, bSplines generates all the spline basis polynomials, which always sum to 1. This is why we have collinearity. But when intercept = FALSE, bSplines drops the first basis polynomial, and we avoid collinearity. So I can just change the default argument of intercept to be FALSE. We can already do this manually, actually, e.g.

    ~ uSplines(intercept = FALSE, degree = 3, knots = seq(from = .25, to = .75, by = .25))

This leads to stable estimates.


  1. With just uSplines are there two rows of the Gram matrix that are identical? As we discussed above, we expect the Gram matrix to have linear dependencies due to partial identification. But having two rows that are identical is a more extreme version of that.

No duplicate rows.

a-torgovitsky commented 2 years ago

If the random number generator is the same, and the Cholesky decomposition is the same, how can the bounds be different?


When intercept = TRUE, bSplines generates all the spline basis polynomials, which always sum to 1. This is why we have collinearity. But when intercept = FALSE, bSplines drops the first basis polynomial, and we avoid collinearity.

Bingo, this explains it perfectly. I think setting intercept = FALSE by default sounds like a good solution. We want to make a note of this behavior in the vignette when discussing splines, so that people know what will happen with a no-constant specification. I think we also want to continue to allow this to be overridden like this too:

    ~ uSplines(intercept = TRUE, degree = 3, knots = seq(from = .25, to = .75, by = .25))

Here's what I get now with intercept = FALSE. Are we all getting the same thing?

library("ivmte")

set.seed(1234)
args <- list(
    data = AE,
    outcome = "worked",
    target = "ate",
    propensity = morekids ~ samesex + yob + black + hisp + other,
    solver = "gurobi",
    audit.nu = 100,
    audit.nx = 10000
)

args[["m0"]] <-
    ~ uSplines(degree = 3,
               intercept = FALSE,
               knots = seq(from = .25, to = .75, by = .25)) +
    yob + black + hisp + other
args[["m1"]] <- args[["m0"]]

for (i in 1:4) {
    print(i)
    print(do.call(ivmte, args))
}

produces:

[1] 1

Bounds on the target parameter: [-0.2952359, 0.1642467]
Audit terminated successfully after 2 rounds 

[1] 2

Bounds on the target parameter: [-0.2952378, 0.1642492]
Audit terminated successfully after 3 rounds 

[1] 3

Bounds on the target parameter: [-0.2951021, 0.1640746]
Audit terminated successfully after 3 rounds 

[1] 4

Bounds on the target parameter: [-0.2952322, 0.1642438]
Audit terminated successfully after 3 rounds 
jkcshea commented 2 years ago

If the random number generator is the same, and the Cholesky decomposition is the same, how can the bounds be different?

Hm, I don't know... I can't replicate your original estimates. Note that your original estimates are also different from those of @johnnybonney also. What is odd is that I can perfectly match the bounds you just posted (when intercept = FALSE).


I think setting intercept = FALSE by default sounds like a good solution.

Okay! I will do that.

a-torgovitsky commented 2 years ago

Perfectly match up to how many digits? I think we want to know the sources of discrepancy that can occur here. Let's focus on the first iteration and look at the debug log for the first point of difference.

library("ivmte")

set.seed(1234)
args <- list(
    data = AE,
    outcome = "worked",
    target = "ate",
    propensity = morekids ~ samesex + yob + black + hisp + other,
    solver = "gurobi",
    audit.nu = 100,
    audit.nx = 10000,
    noisy = TRUE,
    debug = TRUE
)

args[["m0"]] <-
    ~ uSplines(degree = 3,
               knots = seq(from = .25, to = .75, by = .25)) +
    yob + black + hisp + other
args[["m1"]] <- args[["m0"]]
do.call(ivmte, args)

Output is attached in log.txt log.txt

jkcshea commented 2 years ago

Perfectly match up to how many digits?

Ah, sorry, we match up to 6 decimal places.

Let's focus on the first iteration and look at the debug log for the first point of difference.

It looks like we already differ when minimizing the criterion on the first audit.

So here is your solution to minimizing the criterion on the first audit:

Barrier solved model in 20 iterations and 0.02 seconds
Optimal objective -2.94835366e-01

And then here are the violations you get:

    Violations:  102
    Expanding constraint grid to include 100 additional points...

And here is what I get from minimizing the criterion on the first audit:

Barrier solved model in 13 iterations and 0.01 seconds
Optimal objective -2.94824256e-01

And here are my violations:

    Violations:  136 
    Expanding constraint grid to include 100 additional points...

And for documentation, here is the full output from my machine. log.txt

a-torgovitsky commented 2 years ago

Ok this is interesting. For the very first problem of minimizing the criterion, we have the same model fingerprint. So I think that means it's exactly the same model and thus there's nothing going on in ivmte that is differing. However Gurobi produces slightly different output. Could that also be related to the ATLAS/LAPACK issue? Do you have another machine you can check on that doesn't use ATLAS?

Related aside: I noticed Gurobi warns us that min criterion problem is badly scaled. Are we using the Cholesky/slack variable trick for this problem too? Or is it just being entered in its natural form? Do you have a sense of why we have QObjective entries that are of order 1e-10? Are there "essentially 0" entries in the design matrix for some reason?

jkcshea commented 2 years ago

Do you have another machine you can check on that doesn't use ATLAS?

Acropolis does not use ATLAS. And I'm seeing that I get slightly different estimates from Gurobi when setting intercept = TRUE in uSplines.

Are we using the Cholesky/slack variable trick for this problem too? Or is it just being entered in its natural form? Do you have a sense of why we have QObjective entries that are of order 1e-10? Are there "essentially 0" entries in the design matrix for some reason?

Hm, I don't know why but I only implemented the Cholesky approach for the bounds. Maybe because I thought the instability was due to the quadratic constraint. And since there are indeed entries in the design matrix that are almost 0, we get a badly scaled Q matrix when minimizing the criterion. Sorry about that. I'll implement the Cholesky approach when minimizing the criterion and see if that helps.

a-torgovitsky commented 2 years ago

But the same model fingerprint? How can that be? What we have here is an example where Gurobi gets the same model and spits out two different answers. Gurobi version is the same, so it must be something about the underlying libraries?

jkcshea commented 2 years ago

Sorry, I forgot that Acropolis has Gurobi 9.1.1. I have Gurobi 9.1.2. I'll roll back to 9.1.1 to see if that makes a difference.

Also, I compared the models generated on my machine and Acropolis using testthat. The models are indeed identical. I had to set audit.max = 1 to do this to rule out differences stemming from how the audits evolve differently on my machine and Acropolis.


Once incorporating the Cholesky decomposition into minimizing the criterion, the stability is greatly improved. Here's what I get when I re-run your original example (with intercept = TRUE in uSplines).

The following is is very unstable

[1] 1

Bounds on the target parameter: [-0.2952358, 0.1640799]
Audit terminated successfully after 2 rounds 

[1] 2

Bounds on the target parameter: [-0.2952103, 0.1642512]
Audit terminated successfully after 3 rounds 

[1] 3

Bounds on the target parameter: [-0.295101, 0.1640724]
Audit terminated successfully after 3 rounds 

[1] 4

Bounds on the target parameter: [-0.2946923, 0.1654988]
Audit terminated successfully after 3 rounds 

But without the intercept (0+) it becomes quite stable?

[1] 1

Bounds on the target parameter: [-0.2951699, 0.1641617]
Audit terminated successfully after 2 rounds 

[1] 2

Bounds on the target parameter: [-0.2951697, 0.1641617]
Audit terminated successfully after 3 rounds 

[1] 3

Bounds on the target parameter: [-0.2951323, 0.1641149]
Audit terminated successfully after 2 rounds 

[1] 4

Bounds on the target parameter: [-0.2951023, 0.1640746]
Audit terminated successfully after 3 rounds 
a-torgovitsky commented 2 years ago

Ok, fantastic! Let's keep that then.

I'd still like to know what's going on with the different solutions before we close this. The fingerprints were the same...so why were the solutions different? Simplest answer would be different versions of Gurobi, so hopefully that's it.

jkcshea commented 2 years ago

We can finally close this thread :)

It turns out it wasn't the version of Gurobi, but something simple my eyes kept skipping over. It was the number of threads being used by Gurobi. I have 8 threads on my machine, but I had access to many more on the server. So I set the number of threads being used to just 1 for both my machine and server. This is done by passing the argument solver.option = list(threads = 1). And now I can match the results from the server for all the digits that are being reported, whether I use Gurobi 9.1.1 or 9.1.2.

a-torgovitsky commented 2 years ago

Ahhhhhhh...not so simple! Good thinking!