R crashes when checking shape constraints

johnnybonney commented 4 years ago

I am having an issue where R crashes after the bounds are obtained by Gurobi (seen via debug = T). The last output on the log file is the solution to the upper bound problem - I do not make it far enough to see Bounds on target parameter... from ivmte. Specifically, the last thing I see is

Solved with barrier
Solved in 123423 iterations and 1417.92 seconds
Optimal objective  4.560359814e+00

I assume that after the bounds are obtained, the audit procedure is checking whether the shape constraints are satisfied, and then that would possibly be what is causing the crash. I do hesitate to think it is a memory issue, since I successfully ran the exact same specification with a different outcome (which was binary, while the problematic outcome is continuous).

That said, I did try increasing the RAM on the server, all the way up to 140GB, but R is still crashing at this point with this specification. Perhaps I need to crank up the RAM even further, but that seems odd to me, so maybe you can shed a little light on what is going on here. Here are the replication files for the problem.

jkcshea commented 4 years ago

On smaller examples, the audit seems to be working, and the MTRs are obeying the constraints... In the log file, is Gurobi also successful in obtaining a lower bound?

johnnybonney commented 4 years ago

Yes, Gurobi is able to obtain a lower bound. If the run of ivmte would have terminated successfully, and even with debug = T, I would have expected the very next lines (and the only remaining lines of output) to be

    Violations: 0
    Audit finished.

Bounds on the target parameter: ...

jkcshea commented 4 years ago

Yeah, that's what should happen... But that's helpful to know, narrows down where the problem is.

jkcshea commented 4 years ago

Hm, I decided to try just run the code on the server, and it actually ran okay. I did crank up the memory to 200GB, though.

Two things come to mind:

It is a memory problem. You can probably check this by looking at the Rsnow files (assuming you haven't changed those settings in the .sh files for submitting tasks). If the job terminated because of a lack of system resources, then the Rsnow file should say so. But then your constraint matrix is about 400,000 x 1,500, which is huge, but I feel like you've thrown bigger problems at the server before. So if this is a memory problem, I'm not sure what is taking up all the memory.... I guess I may have to revisit #166.
Acropolis is acting up, and jobs involving Gurobi are just being terminated. This seems unlikely, though, since your messages suggest Gurobi is finished...

jkcshea commented 4 years ago

Actually, I'm just going to re-run it allocating 140GB and see what happens.

a-torgovitsky commented 4 years ago

Why 400,000 x 1,500? This is for the Angrist and Evans example? Are there even 400,000 unique covariate bins?

johnnybonney commented 4 years ago

@a-torgovitsky This is for the Gelbach example. I have initgrid.nx = 10932 (the number of observations) and initgrid.nu = 200 (to constrain all of the segments of the spline). I imagine that Josh's 400,000 x 1,500 figure is for the constraint matrix itself and not the grid, and it comes from multiplying all that by the upper/lower bound constraints (I'm not always able to follow the math there).

a-torgovitsky commented 4 years ago

Ok, and what does the MTR specification look like?

johnnybonney commented 4 years ago

The MTR specification is a constant spline (with 10 knots, where the internal knots are based on the distribution of the p-score), linearly interacted with covariates (one of which is a state fixed effect, so takes on 50-ish values):

pscore_knots <- c(0.04, 0.31, 0.48, 0.63, 0.75, 0.82, 0.88, 0.99)

MTRs are then

~factor(state) + factor(state):uSplines(knots = pscore_knots, degree = 0) +
  x1 + x1:uSplines(knots = pscore_knots, degree = 0) +
  ... +
  x10 + x10:uSplines(knots = pscore_knots, degree = 0)

jkcshea commented 4 years ago

It ran okay with 140GB of memory. But then it did not when I allocated 100GB of memory. The Rsnow file says this:

-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[13110,1],0]
  Exit code:    137
--------------------------------------------------------------------------

#################
Epilogue Job Info:
Job ID: 262313.acropolis
User ID: jkcshea
Job Name: Rsnow
Resource List: procs=1,mem=100gb,walltime=2400:00:00,neednodes=batch
Resources Used: cput=01:26:18,vmem=95830552kb,walltime=01:26:28,mem=104857600kb,energy_used=0
Exit Code: 137
Exit code 137 job terminated, resource limit violation
Generally this means you need to request more RAM or CPU
Please compare your resources used/requested
#################

Which confirms it was a memory issue.

Why I was able to run it with 140GB and @johnnybonney was not isn't clear to me. And the fact that this need 100GB to run is astounding to me... I'll have to keep looking.

a-torgovitsky commented 4 years ago

Yes, we need to focus on why this is taking so much memory. It seems excessive to me.

The specification has 9 knot points with a constant spline. So there should be 9 unique rows for each value of x. There are no more than ~11,000 unique values of x. So, 100,000 grid points.

How many terms in the MTR specification? Is that 1,500?

So then we have a matrix with 100,000 x 1,500 = 1.5e8 elements. I generated a matrix of that size with floats using randn and takes up 1.2GB according to pryr::object_size(). Suppose we need 4 of these matrices for Gurobi to impose the shape constraints. We are still under 5GB.

jkcshea commented 4 years ago

@a-torgovitsky sorry for not confirming the calculations above earlier. Those are exactly right.

The reason for the enormous memory requirement is that I constructed the base matrices, and then removed the duplicate rows. To be memory efficient, I should have been removing duplicates as I was constructing the base matrices.

For instance, the example above has initgrid.nu = audit.nu = 200. But there is only one spline (albeit interacted with many different variables), and that spline has 8 segments. So we only really need 8 of the 200 u points in the grid, meaning 96% of the base matrix was not necessary. (This still doesn't explain the server being incapable of carrying out the estimation, as we're nowhere near using 100GB of memory---but that is a separate issue.)

I'll rewrite the code that generates the base matrix so that duplicates are taken care of along the way. In case someone is interested in how I plan to do this, or knows of a way to do this more efficiently:

For a given x, whether a row in the constraint grid is unique depends on the realizations of the u terms (polynomials of u, splines, etc.) So I'll first construct the portion of the base matrix that pertains exclusively to u, and then I'll remove all duplicates there. This means I'll be undoing the interactions between x and u terms.
Afterward, I will I construct the remaining portion of the base matrix that exclusively pertains to x terms.
I will then combine these two parts of the matrices, and construct any interactions between x and u terms declared by the user.
Depending on the kinds of interactions between x and u terms declared by the user, more duplicates may arise from the previous step (e.g. as in the specification in #166, here). So I will remove these new duplicates.

Note that I'm constructing blocks of matrices, and then combining them together, and then removing duplicates. I'm doing it this way because I can exploit R's functions that can quickly construct design matrices, as well as the splines2::bSpline() function that can quickly construct design matrices for splines. I also want to avoid loops (i.e. construct the matrix row-wise), seeing how large some of these initial grids are.

I have been unable to think of a way that is perfectly efficient, i.e. I cannot determine every unique row before constructing the matrix, and then construct only those unique rows. But if someone can, do let me know.

a-torgovitsky commented 4 years ago

Got it. Could you just make sure to benchmark the time taken in the current approach vs the new approach? It's not clear (to me; I imagine computer scientists know) whether it is more efficient to sort (remove duplicates) from a single large matrix or sequentially from smaller matrices, and if so what the speed difference will be. So, let's just make sure we don't take a huge performance hit with this change.

One thing that might help is to focus on smaller matrices that have the same x value but the entire u grid. Just because the u grid is what is creating potential duplicates.

jkcshea commented 4 years ago

Could you just make sure to benchmark the time taken in the current approach vs the new approach?

Sure thing.

One thing that might help is to focus on smaller matrices that have the same x value but the entire u grid. Just because the u grid is what is creating potential duplicates.

Yep, that's the plan.

jkcshea commented 4 years ago

I implemented the procedure above in a slightly different way, but it looks to be working. The runtime is also much shorter, the reason being that we immediately begin working with a much smaller u grid. For instance, for a subsample of 1,000 observations, the original code would take almost 3 minutes to construct the constraint matrix. Having now removed redundant us earlier on, it takes less than 20 seconds.

Memory use is also much better. As mentioned before, duplicates can arise after interacting the x and u terms. With the new code, after combining the x and u grids, there are 3,600 redundant rows that must be removed. But this is only 3.5% of the constraint matrix, whereas we previously were removing 96% of the constraint matrix. (A note on what these duplicates are below.)

I can now run the Gelbach example on my machine, and get the same results as before. It takes about 11--13 minutes from beginning to end. (This is surprising since it took the server about the same amount of time to solve just the minimization problem...)

There are about 160 moments and 11,000 observations, so constructing all the gamma moments takes about 4--5 minutes.
It then takes about 2 minutes to construct the initial grid (which is the same size as the audit grid).
It only took my machine around 2 minutes to solve all 3 LP problems (minimizing criterion, upper bound, and lower bound).
The audit is performed using much of the code for constructing the constraint matrix, so again takes about 2 minutes. (Even though the initial constraint grid and audit grid are the same, I still perform the audit since it is possible for there to be violations, as in the case of #164)

The duplicates mentioned above arise because of the endpoint in the u grid, where u = 1. For a spline of degree 0, all basis splines evaluate to 0 at u = 1. The reason is that the intervals used to define the basis splines are of the form [lb, ub), i.e. they are open on the upper bound.

To my understanding, this isn't problematic so long as there is an intercept term in the MTR. That is, the u = 1 becomes like a 'baseline value', similar to how dummy variables that are dropped become baseline values.

However, if the user declares something like:

  m0 = ~ 0 + uSplines(degree = 0, knots = spline_knots),
  m1 = ~ 0 + uSplines(degree = 0, knots = spline_knots),
  m0.lb = 1,
  m1.lb = 1,

then they'll run into problems related to infeasibility (which we have an error message for). i.e. at u = 1, the MTRs are necessarily 0, violating the lower bound constraints.

I just wanted to mention this in case I was overlooking a potential problem.

a-torgovitsky commented 4 years ago

Ok, that all sounds great, but I'm not sure I understand the last point.

Is that because the audit/constraint grid is set to always include u = 1 in addition to the other points chosen via Halton sequences? Is that a deliberate design choice we made at some point?

jkcshea commented 4 years ago

Is that because the audit/constraint grid is set to always include u = 1 in addition to the other points chosen via Halton sequences? Is that a deliberate design choice we made at some point?

Yep to both. Maybe it makes less sense with constant splines. But if u enters as a polynomial, or some higher-order spline, then I believe we would want to include 0 and 1 in the u-grid to try ensure the shape constraints hold for the whole [0, 1] interval.

a-torgovitsky commented 4 years ago

Yeah I agree, it makes sense

jkcshea commented 4 years ago

We seem okay on this, so I'll close this.

jkcshea / ivmte

R crashes when checking shape constraints #171