jkcshea / ivmte

An R package for implementing the method in Mogstad, Santos, and Torgovitsky (2018, Econometrica).
GNU General Public License v3.0
18 stars 2 forks source link

Trouble when placing different knots for each interacted spline in the MTRs #162

Closed cblandhol closed 4 years ago

cblandhol commented 4 years ago

When a factor variable which takes on 10 values or more is interacted with a constant spline, and the knots vary by the value of x, I get a similar error message as issue #156.

Some notes:

See the example below.

library(data.table)
library(ivmte)

set.seed(1001)
N <- 10000

dt <- data.table(
  id = 1:N,
  u = runif(N),
  z = sample(1:4, N, replace = T),
  x = sample(1:10, N, replace = T), # 9 works, 10 doesn't
  epsilon = rnorm(N, sd = .1)
)

dt[, p := (z == 1)*0.12 + (z == 2)*0.29 + (z == 3)*0.48 + (z == 4)*0.78]
dt[, d := as.integer(u <= p)]
dt[, y0 := 0.9 - 1.1*u + 0.3*u^2]
dt[, y1 := 0.35 - 0.3*u - 0.05*u^2]
dt[, y := y1*d + y0*(1 - d) + epsilon]

# get propensity scores
prop_formula <<- as.formula(paste0("d ~ factor(z)*factor(x)"))

prop_res     <<- glm(formula = prop_formula, 
                     data = dt, 
                     family = "binomial")

dt$pscore.sat  <- predict(prop_res, dt, type = "response")

# constant spline with knots at propensity scores for each x value
x_knots  <- paste0(sort(unique(dt[dt$x == 1]$pscore.sat)), collapse = ",")
u_part   <- paste0("factor(x) + (x == ",1, "):uSplines(knots =c(", x_knots,"), degree = 0)")

for (i in 2:10) {
  x_knots  <- paste0(sort(unique(dt[dt$x == i]$pscore.sat)), collapse = ",")
  u_part   <- paste0(u_part," + ", "(x == ",i, "):uSplines(knots = c(", x_knots,"), degree = 0)")
}

args_test <- list(
  data = dt,
  propensity = d ~ factor(x)*factor(z),
  ivlike = c(y ~ d + factor(x),
             y ~ factor(z) + factor(x)),
  components = l(c(d), c(factor(z))),
  m0 = as.formula(paste0("~", u_part)), 
  target = "att",
  lpsolver = "gurobi"
)
args_test$m1 <- args_test$m0

test_res <- do.call(ivmte, args_test)
LP solver: Gurobi ('gurobi')

Obtaining propensity scores...

Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...
Error in `[<-`(`*tmp*`, , mv, value = 0) : subscript out of bounds
Called from: genGammaSplines(splinesobj = splinesobj[[1]], data = data, lb = w0$lb, 
    ub = w0$ub, multiplier = w0$mp, d = 0, late.rows = lateRows)
jkcshea commented 4 years ago

Done!

The problem was related to how I checked for collinearity in the splines. Early on in the function, I construct a temporary variable in the data as a placeholder for each spline. I then construct a design matrix with all the interactions requested by the user, with the splines being replaced by the temporary variables. If R decides to omit a temporary variable due to collinearity, then I know to omit the corresponding spline as well.

I wasn't careful enough with how I named and searched for the temporary variables. This is why problems cropped up once the size of the support of x exceeded 10. e.g. tmpvar1 was getting confused with tmpvar10.

The code should run now, but you'll have to crank up initgrid.nu and audit.nu to be very large (it worked for me when I had initgrid.nu = 500). The reason is that your MTRs are really flexible, so the problem will be unbounded unless your grid is sufficiently dense.

jkcshea commented 4 years ago

I assume this is okay now, so I'll close it!

cblandhol commented 4 years ago

I assume this is okay now, so I'll close it!

Thank you, it works well now!