jkcshea / ivmte

An R package for implementing the method in Mogstad, Santos, and Torgovitsky (2018, Econometrica).
GNU General Public License v3.0
18 stars 2 forks source link

Uninformative error message when using splines in direct MTR regression #205

Closed cblandhol closed 3 years ago

cblandhol commented 3 years ago

The direct MTR regression throws an uninformative error when using splines in the basis functions. If I understand it correctly this is because direct MTR regression does not work with splines as bases functions because we cannot express the constraints in terms of equation (6) in the documentation posted in issue #194.

It would be helpful if the error message was more informative. Here is an example that illustrates the behavior and the error message:

library(ivmte) 
library(data.table)

# simulate data
set.seed(1)
n <- 5000
u <- runif(n)
z <- as.numeric(rbinom(n,1, .5))
d <- as.numeric(u < z*.25)

v0 <- rnorm(n) + .2*u
m0 <- 0
y0 <- as.numeric(m0 + v0 > 0)

v1 <- rnorm(n) - .2*u
m1 <- .5
y1 <- as.numeric(m1 + v1 > 0)

y <- d*y1 + (1-d)*y0
ivmteSimData <- data.frame(y,d,z)

args <- list(data = ivmteSimData,
             target = "att",
             m0 = ~ uSplines(degree = 0, knots = c(0.25, 0.5)) ,
             m1 = ~  uSplines(degree = 0, knots = c(0.25, 0.5)) ,
             propensity = d ~ factor(z))

# using ivlike
args$ivlike <- c(y ~ factor(d)*factor(z))

do.call(ivmte, args)
Bounds on the target parameter: [0.1176324, 0.1324768]
Audit terminated successfully after 1 round
# bounds using direct MTR regression
args$ivlike       <- NULL
args$outcome <- "y"

do.call(ivmte, args)
LP solver: Gurobi ('gurobi')

Obtaining propensity scores...

Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...

Performing direct MTR regression...
Error in lm.fit(x = resX, y = drY) : NA/NaN/Inf in 'x'
a-torgovitsky commented 3 years ago

I'm not sure if this is true:

If I understand it correctly this is because direct MTR regression does not work with splines as bases functions because we cannot express the constraints in terms of equation (6) in the documentation posted in issue #194.

If it is, I can't remember the rationale. There isn't anything inherently more difficult about constraining splines than constraining other functions (splines are at the end of the day just piecewise polynomials). And @jkcshea 's example in #194 had shape constraints with the audit procedure. So I think this is a bug.

jkcshea commented 3 years ago

Hm, this is indeed an error. The direct MTR regression is supposed to support splines, I remember having an exchange with @slacouture about this... I will take care of this!

jkcshea commented 3 years ago

This is resolved now.

When testing the rescaling methods, I did not consider cases where variables were 0 for all observations. But in the example above, some of the spline components are 0 for all observations. These variables therefore have a norm of 0. But when rescaling the data, we divide each variable by its norm---so we were dividing by zeroes. That was why R was reporting Error in lm.fit(x = resX, y = drY) : NA/NaN/Inf in 'x' So to resolve this issue, variables with norm 0 are assigned a norm of 1.

@cblandhol let me know if this works for you, too.

a-torgovitsky commented 3 years ago

Makes perfect sense, and I guess this must have been happening before lm goes through and drops collinear variables. Thanks @jkcshea !