Stochastic audits - Githubissues

jkcshea commented 3 years ago

Here is the original example illustrating the problem. audit-stochastic.pdf

I am still working on this, but in case you have any immediate hunches:

The audit grid is indeed the same, so the seed isn't affecting that.
The only thing that is stochastic (that I see at the moment) is the initial grid.
The example below shows how changing the initial grid changes the solutions.
This does not happen for the non-regression approaches.

> ## Use a subset of the data (faster)
> set.seed(10)
> AE.small <- AE[sample(seq(nrow(AE)), size = 1000, replace = FALSE), ]
> ## Set up the arguments
> args <- list(data = AE.small,
+              outcome = "worked",
+              target = "ate",
+              m0 = ~ uSplines(degree = 3, knots = seq(from = .1, to = .9, by = .1)) + yob +
+                      black + hisp + other,
+              m1 = ~ uSplines(degree = 3, knots = seq(from = .1, to = .9, by = .1)) + yob +
+                      black + hisp + other,
+              propensity = morekids ~ samesex + yob + black + hisp + other,
+              audit.nu = 50,
+              initgrid.nx = 20,
+              initgrid.nu = 20,
+              solver = 'gurobi')
> ## Estimate using 20 x 20 initial grid.
> set.seed(10)
> do.call(ivmte, args)

Bounds on the target parameter: [-0.2734294, 0.07024355]
Audit terminated successfully after 3 rounds 

> ## Estimate using 21 x 20 initial grid.
> set.seed(10)
> args$initgrid.nx <- 21
> do.call(ivmte, args)

Bounds on the target parameter: [-0.2738942, 0.07031611]
Audit terminated successfully after 3 rounds

a-torgovitsky commented 3 years ago

This is the key clue I think:

This does not happen for the non-regression approaches.

That suggests to me it's a numerical issue concerning QPs and QCQPs.

My guess would be that both initial grids are leading to different solutions that pass the audit grid and lead to criteria that are ever-so-slightly different (and below Gurobi's optimality tolerance). Then these slight differences in criteria lead to noticeable differences in the bounds in step 2.

jkcshea commented 3 years ago

My guess would be that both initial grids are leading to different solutions that pass the audit grid and lead to criteria that are ever-so-slightly different (and below Gurobi's optimality tolerance). Then these slight differences in criteria lead to noticeable differences in the bounds in step 2.

This indeed seems to be the case. I ran a simulation comparing the bounds generated from two different initial grids. There are no controls, so the differences are only because initgrid.nu is different for the two grids. git-example.zip

The different initial grids result in different criterions and different solutions. If I adjust the criterion of one problem to match the other, then the differences in the bounds usually shrink (68% of the time for lower bound, 64% of the time for upper bound). But the bounds are not identical because the initial grids still differ. Making the initial grids the same will result in the same QP/QCQP problems, and thus the same bounds.

Some additional notes:

I found the direct regression can still be unstable at times, and had to increase criterion.tol to 0.01 (from its new default of 1e-04).
I also found that a handful of audits hit their limit of 25 iterations. I will have to revisit #199.

a-torgovitsky commented 3 years ago

Sounds like this is just another issue with the QP/QCQP stability we discussed a while back. I don't think we need to revisit that problem. So I'm going to close this issue.

I lowered criterion.tol to 1e-4 a while back because 1e-2 was giving implausibly wide bounds in the AE example. (Wider than the Manski bounds for example.) The "right" value clearly depends on features of the problem, so without theory to suggest a good value, I just assume err on the side of numerical problems instead of useless bounds.

jkcshea / ivmte

Stochastic audits #207