Implementing the direct MTR regression

jkcshea commented 3 years ago

To briefly summarize what is being done:

The user no longer needs to pass the ivlike argument (or can pass NULL). Instead, this procedure first tries to estimate the MTRs by regressing the outcome variable on the basis functions. If the coefficients are point identified, then they are used to construct the target parameter.
If the coefficients are not point identified (i.e. collinearity in the regression), then we perform a bounding exercise similar to before. The difference is that we no longer have the equality constraints defined by the IV-like estimates (since there are none). Instead, we have a quadratic constraint restricting the sum of squared residuals implied by the MTR coefficients.

Attached is the note by @a-torgovitsky on the procedure, for additional details and for the sake of documentation. direct-mtr-procedure.pdf

This has mostly been implemented, but two things remain:

Allow for bootstrapping
Extend to CPLEX (currently only compatible with Gurobi).

Hopefully this is enough for @johnnybonney @cblandhol @a-torgovitsky to get started.

Below are some examples based on the Mogstad and Torgovitsky (2018) paper.

Example 1: Point identified

In the example below, the coefficients of the MTR are point identified from a linear regression. The target parameter may then be constructed from these coefficients.

> dtm <- ivmte:::gendistMosquito()
> results <- ivmte(data = dtm,
+                  propensity = d ~ 0 + factor(z),
+                  m0 = ~ 1 + u + I(u^2),
+                  m1 = ~ 1 + u + I(u^2),
+                  criterion.tol = 0,
+                  target = 'ate',
+                  outcome = 'ey',
+                  noisy = TRUE)

LP solver: Gurobi ('gurobi')

Obtaining propensity scores...

Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...

Performing direct MTR regression...

Point estimate of the target parameter: -0.2666667

Warning message:
MTR is point identified via linear regression. Shape constraints are ignored. 
> results$mtr.coef
[m0](Intercept)           [m0]u      [m0]I(u^2) [m1](Intercept)           [m1]u 
           0.90           -1.10            0.30            0.35           -0.30 
     [m1]I(u^2) 
          -0.05

The estimates of the MTR and ATE align with the paper, which is good. A testthat test has also been written for this new procedure.

Some things to note:

Since ivlike is no longer passed, the function must be informed of what the outcome variable is. That may be done using the outcome argument.
As stated in the warning, shape constraints are ignored. Shape constraints were also ignored when using GMM in the point identified case. Example 2: Partially identified

Now I add a collinear variable to the MTR. So the coefficients on the MTR will no longer be point identified. A partial identification approach will then be used, even though the user declared point = TRUE.

Shape constraints are no longer ignored now. I've added some shape constraints to demonstrate that the audit procedure takes place as before.

> ## Add a collinear component to the MTR
> dtm$x <- 1
> resultsAlt <- ivmte(data = dtm,
+                     propensity = d ~ 0 + factor(z),
+                     m0 = ~ 1 + u + I(u^2),
+                     m1 = ~ 1 + u + I(u^2) + x, ## includes collinear x
+                     point = TRUE,
+                     criterion.tol = 0.5,
+                     initgrid.nu = 0,
+                     audit.nu = 3,
+                     target = 'ate',
+                     outcome = 'ey',
+                     m0.inc = TRUE,
+                     m1.inc = TRUE,
+                     noisy = TRUE)

LP solver: Gurobi ('gurobi')

Obtaining propensity scores...

Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...

Performing direct MTR regression...
    MTR is not point identified.

Performing audit procedure...
    Generating initial constraint grid...

    Audit count: 1
    Obtaining bounds...
    Violations:  6 
    Expanding constraint grid to include 6 additional points...

    Audit count: 2
    Obtaining bounds...
    Violations:  6 
    Expanding constraint grid to include 6 additional points...

    Audit count: 3
    Obtaining bounds...
    Violations: 0
    Audit finished.

Bounds on the target parameter: [-0.151719, 0.04178115]

As before, the argument criterion.tol allows the user to decide how much to relax the constraint.

Let me know if anything is unclear, or if things should be done differently.

a-torgovitsky commented 3 years ago

Thanks @jkcshea . I think it looks good. One minor question: For outcome we pass this as a string even though it is a variable name. Is that consistent with good R practice?

I will post possible bugs/problems that I find in separate issues.

jkcshea commented 3 years ago

One minor question: For outcome we pass this as a string even though it is a variable name. Is that consistent with good R practice?

Ah, good catch. The function is supposed to allow outcome to be passed as a variable name and not as a string, as I had recycled the code to parse the treat and propensity arguments. But then upon checking I found that there is an error later on in the code! This has now been resolved!

a-torgovitsky commented 3 years ago

By the way, I'm not sure CPLEX will allow for quadratically constrained quadratic programs in R. (Can't remember why though...I think it's a limitation of the API(s) if I recall.) If that's true, then we just need to require Gurobi for this procedure.

jkcshea / ivmte

Implementing the direct MTR regression #194