jkcshea / ivmte

An R package for implementing the method in Mogstad, Santos, and Torgovitsky (2018, Econometrica).
GNU General Public License v3.0
18 stars 2 forks source link

Why does it take so much memory to calculate the moments? #166

Closed johnnybonney closed 4 years ago

johnnybonney commented 4 years ago

@cblandhol and I are both running into some memory issues when calculating the moments. These memory issues seem to result from a combination of sample size, the number of moments, and how involved the functional form for the MTRs is.

I would expect the combination of moments and complex MTRs to pose computationally burdensome LP problems, but ivmte doesn't even make it that far (at least, on my machine). Maybe this reflects some ignorance on my part about what is going on under the hood, but given that the IV-like moments we are specifying are all from linear regressions, I would not think that calculating the moments (esp. in a 10,000 observation data set) would lead to memory issues.

In summary, I am wondering why these memory problems arise and if this usage of memory is intentional/necessary. If so, is there anything we can do to sidestep these issues (besides moving to a more powerful computer/server)?

In case it is relevant, I am on Windows 10, and my computer has 16 GB of RAM.

Here is an example that is similar to what I have been trying. The data I use contain roughly 10,000 observations, and I am trying to interact the MTRs with a state fixed effect (so 50 x-groups).

library(data.table)
library(ivmte)

set.seed(1001)
N <- 10000
N.x <- 50 # number of x-groups

dt <- data.table(
  id = 1:N,
  u = runif(N),
  z = sample(1:4, N, replace = T),
  x = sample(1:N.x, N, replace = T),
  epsilon = rnorm(N, sd = .1)
)

dt[, p := (z == 1)*0.12 + (z == 2)*0.29 + (z == 3)*0.48 + (z == 4)*0.78]
dt[, d := as.integer(u <= p)]
dt[, y0 := 0.9 - 1.1*u + 0.3*u^2]
dt[, y1 := 0.35 - 0.3*u - 0.05*u^2]
dt[, y := y1*d + y0*(1 - d) + epsilon]

spline_knots <- seq(0, 1, length.out = 4)

args <- list(
  data = dt,
  m0 = ~factor(x) + factor(x):uSplines(degree = 3, knots = spline_knots),
  m1 = ~factor(x) + factor(x):uSplines(degree = 3, knots = spline_knots),
  propensity = d ~ factor(z),
  ivlike = c(y ~ 1,
             y ~ factor(x) + d,
             y ~ factor(x) + factor(x):factor(z)),
  components = l(c(intercept), c(d), c(factor(x):factor(z))),
  target = "ate",
  lpsolver = "gurobi"
)

res <- do.call(ivmte, args)
LP solver: Gurobi ('gurobi')

Obtaining propensity scores...

Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...

Generating IV-like moments...
    Moment 1...
    Moment 2...
      ...
    Moment 152...
Error: cannot allocate vector of size 2.2 Gb
jkcshea commented 4 years ago

Hm, yes this is not ideal. The culprit is the matrices used when constructing the gamma moments. This problem must have began occurring after I made the changes here on how to check for collinear moments.

To check for collinearity in the moments, I need a vector of residuals for each moment. However, I'm not generating these residuals in a very efficient way, so let me change that.

johnnybonney commented 4 years ago

I don't know if this has a related cause, but I am also having a number of memory errors like this:

Performing audit procedure...
    Generating audit grid...
    Generating initial constraint grid...Error : cannot allocate vector of size 363.9 Mb

If you think that the root cause is different, let me know and I can come up with an example and submit a different issue.

jkcshea commented 4 years ago

Regarding the new memory issue with audit grids, that comes later in the program, so it should be due to something else. If I could get an example, that would be great.


In regard to the first issue, @a-torgovitsky and I spoke about this. We had agreed to drop the moment counting entirely unless point = TRUE. But a while back, @johnnybonney and @cblandhol requested that the S-weights be returned as part of the output. These weights are thus always saved, regardless of whether point = TRUE or point = FALSE (but they are not always returned, e.g. if smallreturnlist = TRUE). These weights contain the information needed to count the moments. In case one wants to check, here is a brief explanation.

gmm-moment-counting.pdf

That means we can still count the moments, but not suffer from the huge memory cost. I'm still cleaning up some parts, but the first issue should be resolved soon.

jkcshea commented 4 years ago

Okay, the first issue should now be resolved. Previously, in the example above, the function was storing something over 2GB. Now, that object has reduced to 428MB.

johnnybonney commented 4 years ago

This is great -- the moments are now calculated significantly more quickly, and errors of the initial type aren't showing up anymore.


Here is code that reproduces the second type of error. The example is a little complicated, but I couldn't otherwise find a way to replicate the behavior. I believe the error has to do with large grids combined with the number of parameters in the m0 and m1 specifications:

library(data.table)
library(ivmte)

set.seed(1001)
N <- 2000

dt <- data.table(
  id = 1:N,
  u = runif(N),
  z = sample(1:4, N, replace = T),
  x1 = sample(1:10, N, replace = T),
  x2 = runif(N),
  x3 = runif(N),
  x4 = runif(N),
  x5 = runif(N),
  x6 = runif(N),
  x7 = runif(N),
  x8 = runif(N),
  x9 = runif(N),
  x10 = runif(N),
  x11 = runif(N),
  epsilon = rnorm(N, sd = .1)
)

dt[, p := (z == 1)*0.12 + (z == 2)*0.29 + (z == 3)*0.48 + (z == 4)*0.78]
dt[, d := as.integer(u <= p)]
dt[, y0 := 0.9 - 1.1*u + 0.3*u^2]
dt[, y1 := 0.35 - 0.3*u - 0.05*u^2]
dt[, y := y1*d + y0*(1 - d) + epsilon]

args <- list(
  data = dt,
  propensity = d ~ factor(z),
  ivlike = c(y ~ 1,
             y ~ factor(x1) + d),
  components = l(c(intercept), c(d)),
  target = "ate",
  m0.inc = FALSE, 
  audit.nu = 40,
  audit.nx = 2000,
  initgrid.nu = 40,
  initgrid.nx = 2000
)

spline_knots <- seq(0, 1, length.out = 31)
spline_deg <- 0

args$m0 <- args$m1 <- ~factor(x1) + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 +
  factor(x1):uSplines(degree = spline_deg, knots = spline_knots) +
  x2:uSplines(degree = spline_deg, knots = spline_knots) +
  x3:uSplines(degree = spline_deg, knots = spline_knots) +
  x4:uSplines(degree = spline_deg, knots = spline_knots) +
  x5:uSplines(degree = spline_deg, knots = spline_knots) + 
  x6:uSplines(degree = spline_deg, knots = spline_knots) +
  x7:uSplines(degree = spline_deg, knots = spline_knots) + 
  x8:uSplines(degree = spline_deg, knots = spline_knots) +
  x9:uSplines(degree = spline_deg, knots = spline_knots) + 
  x10:uSplines(degree = spline_deg, knots = spline_knots) +
  x11:uSplines(degree = spline_deg, knots = spline_knots)

res <- do.call(ivmte, args)
LP solver: Gurobi ('gurobi')

Obtaining propensity scores...

Generating target moments...
    Integrating terms for control group...
    Integrating terms for treated group...

Generating IV-like moments...
    Moment 1...
    Moment 2...
    Independent moments: 2 

Performing audit procedure...
    Generating audit grid...
    Generating initial constraint grid...Error: cannot allocate vector of size 3.1 Gb

Something I noticed is if you redefine spline_knots <- seq(0, 1, length.out = 11) (or remove some of the spline interactions), everything runs just fine. (This is why I think the error results from the number of parameters.)

jkcshea commented 4 years ago

In this case, the memory issues are indeed due to the sheer size of the grid, but also because of some inefficient code.

Setting audit.nx = 2000, and audit.nu = 40 gives you 84,000 points (audit.nu and initgrid.nu count only the number us drawn from (0, 1); we then always add the end points, so really there are 42 us being considered---let me know if we should change this). For each of those points, you have 4 default LB and UB constraints, i.e. 84,000 * 4 = 336,000 rows in the audit matrix. Your MTE specification has 1,240 parameters. The 336,000 x 1,240 audit grid is about 3.2GB in size.

I then did something inefficient, not thinking about the memory cost of these giant matrices. That cost another 3.2GB, but I will fix this.

The memory error then pops up because ivmte tries to create the initial grid. And in your specification, the audit grid and initial grid are the same, so that requires another 3.2GB. (Very easy to change the code so that an initial grid is not reconstructed whenever initgrid.nu = audit.nu and initgrid.nx = audit.nx).

Still, that amounts to < 10GB, and our OS's can't be so costly as to require ~6GB to be running in the background. So I'm not sure why R is giving up so soon. I will take a closer look into how I construct these matrices, I am likely doing something inefficiently.

jkcshea commented 4 years ago

I just pushed a version of the code that can now handle the second case. More work still needs to be done on this, though. The problem is that whenever you pass an object into a function, and then modify the object, R will create a duplicate. (If no modifications are made, R will just use a pointer)

Here is what is currently happening.

  1. The specification provided results in an enormous audit matrix.
  2. This audit matrix is then passed into another function that constructs the object the LP solvers look at---some adjustments are made, and there we have our first duplicate.
  3. This LP object is then passed to either the function that minimizes the criterion, or the function that obtains the bounds---in either case, more adjustments are made, and there we have our second duplicate.

So you can have up to 3 instances of the enormous matrix in memory. I'll see if I can find a way to restructure the code to avoid this.

a-torgovitsky commented 4 years ago

Makes sense. Maybe before restructuring the code its worth looking into whether there's a way to always pass by reference? A quick google search turned up a lot of posts -- seems like a natural problem many others would have encountered.

jkcshea commented 4 years ago

Ah, I had forgotten about passing by reference. Sure, I'll take a look at that.

jkcshea commented 4 years ago

Just an update for @johnnybonney and @cblandhol, as I had thought I was closer to being finished with this than I really was.

After a lot of restructuring, I was not very successful in reducing the memory requirement. Passing by reference seems to help, but only after one has constructed the audit grid. And constructing an enormous audit grid is the challenge.

Nevertheless, I did just learn a strange (or perhaps normal?) feature of R that can help resolve the issue. Creating an object of size x actually requires twice the memory. Here's a simple example.

> rm(list = ls())
> library(pryr)
> library(profvis)
> gc(reset = TRUE)
          used  (Mb) gc trigger   (Mb) max used  (Mb)
Ncells 2101111 112.3    3451934  184.4  2101111 112.3
Vcells 8412387  64.2  969750590 7398.7  8412387  64.2

> ## It seems like assigning an object of X MB requires 2X MB of
> ## memory. So an object of 8GB should require 16GB to generate. A
> ## system with 16GB of RAM should not be able ot handle this.

> ## A function to generate matrices
> genMat <- function(y) {
+     message(y, " values generated")
+     matrix(rep(1.1, y), ncol = 100)
+ }
> a <- genMat(2000000)
2e+06 values generated
> object_size(a)
16 MB
> ## 2 million double cells equals to 16 MB. So 1 billion double cells should
> ## equal to 8 GB---assigning this should not be possible. 
> ## Clear out memory.
> rm(a)
> gc()
          used  (Mb) gc trigger   (Mb) max used  (Mb)
Ncells 2101157 112.3    3451934  184.4  2103033 112.4
Vcells 8412406  64.2  775800472 5918.9 12413234  94.8

## Try assigning 8GB matrix
> b <- genMat(1000000000)
1e+09 values
Error: cannot allocate vector of size 7.5 Gb

Another issue is replacing objects in memory. e.g. Suppose matrices a and b are 1GB each, and I have them stored in memory already. Suppose I update b by running b <- rbind(a, b). R will first make rbind(a, b), which will cost an additional 2GB. Afterwards, R replaces b, freeing up 1GB. So all together, I need 4GB of memory to carry this out, although I end up with only 3GB of matrices. (Note: the memory requirement was not doubled anywhere here, since I was not constructing a new object, but simply sticking together objects already stored in memory).

These two features of R are what cause the memory allocation errors.

I currently construct the audit grid by constraint type (e.g. first the LB constraints for m0, then the LB constraints for m1, then the UB constraints for m0, etc.). Each of these sections can be quite large, e.g. in the example above, if N = 3000, then each of these sections is 1.5GB, thus R requires about 3GB to construct them. They are then being appended together using that replacement strategy from above, e.g.

grid <- NULL
...
new_constraints1 <- genConst(type1)
grid <- rbind(grid, new_constraints1)
...
new_constraints2 <- genConst(type2)
grid <- rbind(grid, new_constraints2)
...
new_constraints3 <- genConst(type3)
grid <- rbind(grid, new_constraints3)

At some point, R will run out of memory.

One way to try get around this is to construct the grid in smaller chunks. So rather than have R require 4GB of additional memory each time, it only requires 200MB, so we can better exhaust the memory. I'll give this a shot, but I'm worried about performance as it will involve some form of a loop.

jkcshea commented 4 years ago

Actually, constructing the grid in smaller chunks may not be that helpful... Again, if I have to update the grid using something like

grid <- rbind(grid, new_constraints)

then there will always be 2 copies of grid existing at some point:

  1. the grid in rbind(grid, new_constraints)
  2. the grid stored in memory, that is about to be replaced

Once grid gets sufficiently large, the memory gets exhausted very quickly. This is what is happening.

But now better understanding the problem, it seems like data.table has a fast and memory efficient alternative to rbind, so I'll try that first.

jkcshea commented 4 years ago

Hm, unfortunately, data.table's rbindlist function is only faster, but saves no memory.

a-torgovitsky commented 4 years ago

Can you remind me what for example the (i,j) element of the audit matrix represents? I am forgetting conceptually why we need to keep track of such a high dimensional object.

jkcshea commented 4 years ago

Sure. Each row i of the audit matrix corresponds to a point to evaluate the MTRs. And each column j corresponds to a term in the MTR. (At least in the case of bounds; for monotonicity, each entry is interpreted slightly differently)

When imposing the monotonicity constraints via the initial grid, a subset of rows is taken from the audit matrix. We keep track of this matrix because of the bootstrap procedure: the same audit grid is used in every bootstrap.

If not bootstrapping, though, one way to save memory is to make sure no points are repeated in the audit and initial grid. That is, if a point is included in the initial grid, it can be deleted from the audit grid since the shape constraints at that point should be satisfied (assuming no tolerance issues, as in #164, a relevant post being this).

a-torgovitsky commented 4 years ago

Just to make sure I am understanding. If $(x{i}, u{i})$ is a point in our grid and $m(x,u) = \sum{j=1}^{J}b{j}(x,u)\theta{j}$ is the basis expansion, then the $(i,j)$ element of the audit matrix is $b{j}(x{i}, u{i})$. Is that correct?


Assuming that is correct, I am still confused.

Suppose I just save the list of points (the rows). That's just a vector of integers for the x part, since x{i} is an observation (so a row number), and a vector of double for the u part, corresponding to the one-dimensional grid from which u{i} was taken. Then I can always reconstruct the matrix, or any subset of it, from these two vectors. I wouldn't need to save X \times U, just the two vectors: one for X and one for U.

So, presumably the reason we want to save the entire matrix and carry it around with us is that it is time consuming to repeatedly construct its rows. But is that true? Evaluating an MTR at any given point should be pretty cheap. And we only need to do the audit at the end of each solve. The solves are presumably the big bottleneck in time.

The bootstrap shouldn't affect this reasoning at all.

jkcshea commented 4 years ago

Just to make sure I am understanding.

Yep, that's all correct.

Then I can always reconstruct the matrix, or any subset of it, from these two vectors.

Yeah... I remember you explaining this to me before, but for some reason I went with the matrix. It may have been because it can take some time to construct the audit grids (in the example above, it takes around a minute), but this requires fairly large grids. It looks like I made the wrong choice, so I'll start planning how to efficiently implement your approach.

a-torgovitsky commented 4 years ago

Well, why don't you start by first benchmarking how long it takes to construct a row. (Or maybe 10 or 100 rows, depending on how costly it is.) All rows should be roughly equally costly to construct, right? Since it is just a matter of evaluating the basis functions, I can't imagine that this would be really costly, but maybe I am wrong.


Also, a different comment (which will affect either approach) is related to this that you said earlier:

Setting audit.nx = 2000, and audit.nu = 40 gives you 84,000 points (audit.nu and initgrid.nu count only the number us drawn from (0, 1); we then always add the end points, so really there are 42 us being considered---let me know if we should change this). For each of those points, you have 4 default LB and UB constraints, i.e. 84,000 * 4 = 336,000 rows in the audit matrix. Your MTE specification has 1,240 parameters. The 336,000 x 1,240 audit grid is about 3.2GB in size.

I don't understand why you need to multiply by 4 here. We just agreed that the audit matrix has (i,j) row corresponding to $b{j}(x{i}, u_{i})$. So, it doesn't depend on the constraint being evaluated at $(x{i}, u{i})$. This suggests that the audit matrix is 4 times as large as it needs to be.

jkcshea commented 4 years ago

 why don't you start by first benchmarking how long it takes to construct a row All rows should be roughly equally costly to construct, right?

I tried setting audit.nx to 25, 50, 100, 200, 400 while fixing audit.nu = 10. So the number of rows is 1200, 2400, 4800, 9600, 19200, 38400. The average time it takes to generate each row seems to indeed be about the same (0.2 milliseconds).

It also seems like having 600 spline terms in each MTR is not very different from having 600 non-spline terms. This was a surprise since I have to manually construct the design matrix for splines and their interactions.

Assuming most users will not be passing such complicated MTRs, it seems reasonable to reconstruct the audit matrix each time. For instance, in the example above, if each MTR only had 20 terms, then the audit grid is constructed in 1.8 seconds (versus ~70 seconds it currently takes). And by default, audit.max = 5, so the grid will only be constructed so many times/the user does not have to wait too long before a result is returned.

Moreover, once we switch to the structure you suggested, performing the audit should become more efficient.


I don't understand why you need to multiply by 4 here.

Yeah, you're right, I should redo things to avoid this. The reason why this is currently happening is that the code used to generate the audit matrix is the same code that is used to generate the initial grid. At the time, it seemed wise to recycle code seeing how similar the objects were, but this is very inefficient.

But in the current example, I believe this has to happen. The reason is that the initial grid and audit grid are specified to be equal. So in order to declare the upper and lower bounds for m0 and m1 in the LP solvers, I need to include each point 4 times. i.e. the audit grid should never require duplication, as you said; but the initial grid always requires duplication.

a-torgovitsky commented 4 years ago

Are you vectorizing the construction of the audit grid? Still seems bizarre to me that it takes 70 seconds to evaluate this matrix. It's large, true, but it's just a bunch of monomials, isn't it?


That's true that for the purposes of passing the constraint matrix to Gurobi, it will need to be fully specified even if it is redundant. I guess that type of inefficiency is something Gurobi would need to have a solution for in some sense.


More basically, is there a reason we need to the initial audit grid to be so large here? @johnnybonney do you remember why you made that choice? When we were designing the audit procedure I had anticipated needing a much smaller grid.

jkcshea commented 4 years ago

Are you vectorizing the construction of the audit grid? Still seems bizarre to me that it takes 70 seconds to evaluate this matrix.

Hm, you've spotted something. I am vectorizing things, but the problem is elsewhere. I was converting the large audit grid from a matrix into a data.frame. I see why I did it, but it is easily avoidable... About a third of the time is spent on this conversion, so I can remove that.

A lot of time is also lost when combining all the submatrices comprising the enormous audit grid. Again, I'm appending blocks together using

grid <- rbind(grid, lb_for_m0)
grid <- rbind(grid, ub_for_m0)
grid <- rbind(grid, lb_for_m1)
grid <- rbind(grid, ub_for_m1)

As grid continues to grow, these operations slow down substantially. For example, the last rbind takes 10 seconds.

Here's an example:

> aaa <- matrix(rnorm(100000000), nrow = 100)
> bbb <- matrix(rnorm(100000000), nrow = 100)
> object_size(bbb)
800 MB
> t0 <- Sys.time()
> ccc <- rbind(aaa, bbb)
> Sys.time() - t0
Time difference of 7.983712 secs
a-torgovitsky commented 4 years ago

Good find! There are surely a Stack Exchange post somewhere where people benchmark/explain this

johnnybonney commented 4 years ago

Regarding my choices for the size of the initial grid --

I don't always make the initial grid so large, but in certain applications (primarily those in which I linearly interact continuous covariates with the specification of u) I would often see the audit procedure iterate many times until almost every point in the audit grid was explicitly constrained.

Since these LP problems were taking 10-15 minutes each, I decided to add all the points to the initial grid in order to trade a 3+ hour audit procedure for a single, ~15 minute LP problem.

I should note that these problems seem to have come up more frequently in the past couple of months. I am unable to re-run specifications that I ran without problems back in November.

a-torgovitsky commented 4 years ago

That's interesting, and also slightly concerning, since it suggests that the solution might not satisfy the shape constraints off of the audit grid.

Can you clarify this?

I should note that these problems seem to have come up more frequently in the past couple of months. I am unable to re-run specifications that I ran without problems back in November.

What problems and what specifications? Hopefully we are going forward here not backward...

jkcshea commented 4 years ago

I should note that these problems seem to have come up more frequently in the past couple of months. I am unable to re-run specifications that I ran without problems back in November.

Hm, sorry about that, that really is not good. Do you know if any examples of these were posted here before? If so, then I probably have the code annotated somewhere, and can try determine what I changed.

johnnybonney commented 4 years ago

That's interesting, and also slightly concerning, since it suggests that the solution might not satisfy the shape constraints off of the audit grid.

I should note that in the cases I have been looking at, the number of observations is low enough (less than 2000) that I can set audit.nx equal to the number of observations. I also try to increase audit.nu to the point where the bounds aren't sensitive to further increases.

What problems and what specifications?

These are the Dinkelman specifications, where the MTRs are specified as splines in u, linearly interacted with covariates:

~uSpline(...) + x1 + x1:uSpline(...) + x2 + x2:uSpline(...) + ...

I obtained a majority of the current Dinkelman results about two and a half months ago. I have recently been trying to re-run the specifications (without success). It is always possible that something changed on my computer (though I would not know what). Although to make sure I am not sending anyone on a wild goose chase, I will look through the history of my code to make sure I truly am running the exact same specifications as before.

Do you know if any examples of these were posted here before?

I don't think any were, no. I can post a zip file with code and data if you would like.

jkcshea commented 4 years ago

I can post a zip file with code and data if you would like.

Yes, please. Even if the memory issue is identical to what is described above, it may still be helpful

johnnybonney commented 4 years ago

OK, let me clarify -- I said earlier that I had been running the exact same specifications as a few months ago. I was incorrect, since I had changed the target parameter slightly. (I was estimating a generalized LATE and had changed the lower bound.) However, I would not expect this to change anything as far as memory goes.

That said, I am no longer getting memory issues with those specifications (although I was a few weeks ago). So, I do not have an example of code that used to run but now has memory issues. Sorry for the worry!

I do have new code that has always run into memory issues. @jkcshea I will put together and post an example of that, in case it is useful.

jkcshea commented 4 years ago

I will put together and post an example of that, in case it is useful.

Thank you, an additional example to study this issue may prove insightful.

I have an outline for how to implement @a-torgovitsky's idea of efficiently performing the audit. Hopefully, once implemented, these memory issue tied to the construction of the audit grid should become much more difficult to generate. Things should also be faster, as I can avoid some of those rbind commands. (But enormous initial grids may still be problematic.)

johnnybonney commented 4 years ago

Here is a zip folder that contains code and data to reproduce two specifications in which I run into memory issues.

(I can produce other examples if you would like them in the future, but they have significantly more moments and so take a bit longer to run.)

jkcshea commented 4 years ago

The overhaul of the audit procedure is finally done! The audit grid is no longer constructed in its entirety and stored. Instead, the audit now tests each type of shape constraint separately. For each type of constraint, a smaller audit grid is constructed, an audit is performed, the violations are recorded, and then the small audit grid is deleted. This avoids having to rbind or cbind enormous matrices, which can several seconds.

However, the memory issues in the simulated example and the Dinkelman example continue to persist. The reasons are that

  1. the options passed to ivmte still imply enormous initial grids, which must always be constructed in their entirety
  2. I can't get around R requiring double memory to construct objects.

Regarding the Dinkelman example: @johnnybonney mentioned that shape constraints only have to be imposed for specific combinations of covariates x and unobservables u. However, the grids are currently constructed by taking the Cartesian product of a grid for x and a grid for u. Restricting the initial grid to specific combinations of x and u will save memory and can be coded up, but this seems rather example-specific. I will let @a-torgovitsky decide on whether we want to incorporate that into the function.

a-torgovitsky commented 4 years ago

@johnnybonney mentioned that shape constraints only have to be imposed for specific combinations of covariates x and unobservables u

What are these? I couldn't find in the discussion above which constraints these were or why they only need to be imposed for specific combinations.

jkcshea commented 4 years ago

Sorry, those were in the .zip file from the previous post

a-torgovitsky commented 4 years ago

Ok, I see this comment in the file

in theory, I believe I should only need to check 1816 x 2 X-U combinations (one per parameter) to ensure that all shape constraints are satisfied. However, given the nature of the grid structure, I can only do that if I check each u-value for each x-group (1816 x 1816 grid points).

But I'm not sure I understand the background. What is it about the MTR specification that makes the grid unnecessary?

johnnybonney commented 4 years ago

These MTR specifications are the nonseparable, nonparametric splines (constant spline for each X with internal knots at the propensity score values).

If the only shape constraints on the MTRs are the upper and lower bounds, it is sufficient for each X to check that the estimated parameters for each segment of the spline are between the upper and lower bounds.

For example, in the simple case with a binary instrument that is fully supported conditional on X, there will be three parameters to check for each X group---the coefficients on 1[u <= p(x,0)], 1[u \in (p(x,0), p(x,1)]], and 1[u > p(x,1)]. This is equivalent to imposing an audit grid where audit.nu contains p(x,0), p(x,1), and 1.

What I was trying to point out was that, say, if p(x,0) is low (<0.01), in ivmte, I need to impose a high audit.nu to ensure that some u value below 0.01 makes it into the audit grid (and there will be many redundant points in audit.nu conditional on X, since checking 3 points is adequate). In the case I was working with, I was never able to find a solution, since the problem was unbounded if audit.nu wasn't high enough, and the audit grid became too large to handle if I increased it. (Though perhaps the recent update changes this.)

However, I realize that constant spline MTRs are a special case, so you understandably may not want to adjust the function of the grid, which is OK with me. I just wanted to point this out.

a-torgovitsky commented 4 years ago

Doh, I remember typing a response a few days ago, but I must not have submitted it...

If I understand @johnnybonney 's example, it would mean that there are redundant rows in the audit constraint matrix. That is, for a fixed x, there are only 3 different rows possible, since there are only three regions between [0,1] determined by the constant spline. @jkcshea is that not correct? If it is correct, it means we can drop all of the redundant rows, and then used the savings to find grid points (e.g. u < .01) that are not redundant.

jkcshea commented 4 years ago

Yes that sounds right. So we want to implement this into the function? If so, that would mean there is an option that restricts the grid of u points so that we have one point in every interval created by the propensity scores. Since this example involves a constant spline, any point within each interval will serve equally well. If it is not a constant spline, then perhaps we just choose the midpoint? (A Halton sequence of length 1 suggests the midpoint.)

Alternatively, I could update the function to so that, even in failed cases, the LP model is returned. The user can then drop the redundant rows in the LP model. I keep a running index of which rows in the constraint matrices correspond to which point in the grid, so I just need to output that as well.

a-torgovitsky commented 4 years ago

I don't think that the solution here is to add another option.

The problem was that the constraint/audit grids were too big. It appears that this is because they contain many redundant rows. So if these redundant rows are not included, it should be possible to set the audit/constraint grid sizes to be much larger without impacting performance. This would mean that the [0,.01] knot for example would be picked up just with a sufficiently large number of u points in the grid. In @johnnybonney 's example, at most three would be kept anyway.

jkcshea commented 4 years ago

Hm, this seems quite case specific. So to make sure I'm understanding correctly:

Since every spline is declared with its own set of knots, we can potentially restrict the number of points to be audited based on

  1. The knots
  2. The degree of the spline.

If the spline is of degree 0, then we only need one point from each interval created by the knots. If the spline is of degree 1, then we only need two points from each interval. For degree 2 and above, it seems like we need more points from each interval.

So for MTR specifications where

  1. the u enters only through splines
  2. all splines are interacted with some boolean expression of x (e.g. (x >= 3))

we want ivmte to restrict the constraint/audit grid accordingly for each point x, if the interacted spline is of degree 0 or 1. Is that right?

Or is this much more general than I'm realizing?

a-torgovitsky commented 4 years ago

I think it's more general, although maybe I am missing something.

For each (x,u) pair in the grid we have a vector of basis functions b(x,u). The length of this vector is the length of our MTR parameters. I conceptualize the audit/constraint grid as having rows given by b(x,u) for each (x,u) pair. (Maybe you are coding it up differently, but that's the information it contains.) The grid is multiplied against the vector of MTR coefficients to get evaluated MTR functions at each (x,u) pair. The result is then inserted into some sequence of linear comparisons to determine whether the shape constraints are satisfied (or to impose them).

The situation that @johnnybonney is raising is just an example where we have two (x,u) pairs that yield the same b(x,u) vector. That is, we have redundant rows in the audit/constraint grid. This is more likely to happen with constant splines, since for other types of splines or polynomials, b(x,u) will vary in u even within an interval. But I don't think it is specific to splines at all. It could equally well happen in terms of the x variable.

jkcshea commented 4 years ago

Ahh, I see.

I conceptualize the audit/constraint grid as having rows given by b(x,u) for each (x,u) pair.

Yes, that's exactly whats being done. This makes sense and should be straightforward to resolve.

a-torgovitsky commented 4 years ago

My guess is that you will want to remove redundant rows at various intervals rather than waiting until you have a giant matrix, but I'm not quite sure how the complexity of the row sorting/uniqueness algorithms work.

For @johnnybonney the solution after @jkcshea is done will be to just set the u portion of the audit grid to be much larger than before. This should pick up even small regions without adding computational complexity.

jkcshea commented 4 years ago

The code for removing the duplicate rows in the constraint matrix has been implemented. In the Dinkelman example @johnnybonney uploaded, the first example still cannot be run because the uSplines are additively separable, so the constraint matrix can't be reduced. I've thrown the second example onto the server. As stated in the code, it looks like it'll take some time to run, so I'll send an update once it's completed.

a-torgovitsky commented 4 years ago

Just to be clear here, are you saying that the fact that uSplines are additively separable implies that the constraint matrix can't be reduced? If so, I don't follow that.

jkcshea commented 4 years ago

Ah, I'm sorry, that is most certainly incorrect. Removal of duplicates is not contingent on how the model is specified. So long as there are duplicate rows, they are removed.

What I should've said was: In the first example, there are indeed duplicate rows, and they get removed. But the way in which the u terms enter still implies that, for every x, we need to check many, many u points. Thus, the constraint matrix is still enormous.

But in the second example, where the u terms enter as degree 0 splines and are interacted with x, only two points of u should be needed for every x. Thus, most of the rows in the constraint matrices should be eliminated. This is what happens in smaller test cases. The task on the server is still running, though.

a-torgovitsky commented 4 years ago

Got it, thanks for the clarification

jkcshea commented 4 years ago

The code on the server ran out of memory (despite assigning it 50GB...), but this may be because of how R deals with factor variables. I'm sorry I did not raise this earlier, as I had come across this before when writing to code to handle boolean expressions in the formulas.

Here is little R script demonstrating the problem. design-example.zip

In short, R will expand all factor and boolean variables until the point of collinearity. This means that sometimes both (x == 1)TRUE and (x == 1)FALSE will be included in the design matrix, and both will be interacted with other variables. Maybe that's not what you intended, but this is what is happening in the second Dinkelman example.

There, you have 1816 observations, and for each one you include a degree-0 spline with one knot. So maybe you were expecting to have 1816 * 2 terms (plus the intercept) in each of your MTRs. Instead, there are over 7000. The reason is that (x == 1)FALSE:uSpline(<specs for person 1>) can be evaluated for everyone else, other than the person with (x == 1)TRUE. So not only do you have a lot more columns in your constraint matrix than expected, but also a lot more rows (since these additional rows are not duplicates of others).

The base-matrix I use to construct all the submatrices in the constraint matrix is 47,000 * 7000 for each MTR, even after removing duplicate rows. To get the full size of the initial grid, you would have to scale the row count by 4 (for the LB, UB constraints for each MTR),. Since m0 and m1 are the same, you just double the column count. That would amount to something like a 200,000 x 14,000 initial grid.

If you construct the indicators as numeric variables, as in the example above, then you get the specification I think you're looking for. That is, the number of terms in each MTR is indeed 1816 * 2 (plus the intercept term). I have this running on the server now, let's see how it goes. @johnnybonney Let me know if that's what you intended. If so, and the job on the server runs successfully, I'll send you the code.

johnnybonney commented 4 years ago

I see. Yes, that was not intentional, so that is a good catch. I wanted to linearly interact an indicator (and not fully interact a boolean) with the spline.

Thanks for the example. To make sure I understand---if there is a variable x that takes three values (0, 1, 2),

Is that right?

jkcshea commented 4 years ago

Sure, no problem. I should've mentioned this earlier seeing how much you were using these boolean expressions, I must've just forgotten. I'm sorry for the inconvenience, hopefully it's not too difficult/late to remedy.

And to confirm your questions:

  1. Yep So there is still one spline, and the LP problem will choose the coefficients for each of the basis functions. This this spline will get scaled differently for each individual, depending on their x.

  2. Yep. So there are now effectively three splines, one for each group of agents defined by x.

  3. Yep (usually). To deal with spline interactions, I tried to use as much of R's base code as possible. So each spline gets mapped to a temporary variable so that R can interpret the formula. I then let R generate all the interactions with those temporary variables, and then map them back to splines. So R didn't think there were any collinearity issues with the temporary variables, then you'll get two sets of splines for a given interaction. If there are collinearities, then R will drop one of those sets of splines.

  4. Yep.

jkcshea commented 4 years ago

The code is still running on the server. Despite removing duplicate rows in the initial grid, it's still a huge LP problem. But it is running without memory issus. Gurobi output is being produced, and the dimensions of the grid is as as it should be.

So in case it saves you some time, attached is the code I modified. All I added was a loop to construct those numerical dummies, and construct the formulas. Maybe you have a more efficient way to do that using tidyverse.

dinkelman_ex_alt.zip

johnnybonney commented 4 years ago

Great, thanks Josh.

To make sure I understand how to fully take advantage of the removal of redundant rows (which I think is a great addition for estimating nonparametric MTRs), consider the following example:

If this should be correct, I believe I have a (more complex) example where this does not hold -- the LP problems are unbounded, despite initgrid.nx = nrow(dt) and a theoretically large enough initgrid.nu. If this is a problem, I can post it as a separate issue (since we seem to have migrated away from moments and memory calculation, the original problem in this issue).