Closed johnnybonney closed 4 years ago
On smaller examples, the audit seems to be working, and the MTRs are obeying the constraints... In the log file, is Gurobi also successful in obtaining a lower bound?
Yes, Gurobi is able to obtain a lower bound. If the run of ivmte
would have terminated successfully, and even with debug = T
, I would have expected the very next lines (and the only remaining lines of output) to be
Violations: 0
Audit finished.
Bounds on the target parameter: ...
Yeah, that's what should happen... But that's helpful to know, narrows down where the problem is.
Hm, I decided to try just run the code on the server, and it actually ran okay. I did crank up the memory to 200GB, though.
Two things come to mind:
Rsnow
files (assuming you haven't changed those settings in the .sh
files for submitting tasks).
If the job terminated because of a lack of system resources, then the Rsnow
file should say so.
But then your constraint matrix is about 400,000 x 1,500, which is huge, but I feel like you've thrown bigger problems at the server before.
So if this is a memory problem, I'm not sure what is taking up all the memory.... I guess I may have to revisit #166.Actually, I'm just going to re-run it allocating 140GB and see what happens.
Why 400,000 x 1,500? This is for the Angrist and Evans example? Are there even 400,000 unique covariate bins?
@a-torgovitsky This is for the Gelbach example. I have initgrid.nx = 10932
(the number of observations) and initgrid.nu = 200
(to constrain all of the segments of the spline). I imagine that Josh's 400,000 x 1,500 figure is for the constraint matrix itself and not the grid, and it comes from multiplying all that by the upper/lower bound constraints (I'm not always able to follow the math there).
Ok, and what does the MTR specification look like?
The MTR specification is a constant spline (with 10 knots, where the internal knots are based on the distribution of the p-score), linearly interacted with covariates (one of which is a state fixed effect, so takes on 50-ish values):
pscore_knots <- c(0.04, 0.31, 0.48, 0.63, 0.75, 0.82, 0.88, 0.99)
MTRs are then
~factor(state) + factor(state):uSplines(knots = pscore_knots, degree = 0) +
x1 + x1:uSplines(knots = pscore_knots, degree = 0) +
... +
x10 + x10:uSplines(knots = pscore_knots, degree = 0)
It ran okay with 140GB of memory.
But then it did not when I allocated 100GB of memory.
The Rsnow
file says this:
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[13110,1],0]
Exit code: 137
--------------------------------------------------------------------------
#################
Epilogue Job Info:
Job ID: 262313.acropolis
User ID: jkcshea
Job Name: Rsnow
Resource List: procs=1,mem=100gb,walltime=2400:00:00,neednodes=batch
Resources Used: cput=01:26:18,vmem=95830552kb,walltime=01:26:28,mem=104857600kb,energy_used=0
Exit Code: 137
Exit code 137 job terminated, resource limit violation
Generally this means you need to request more RAM or CPU
Please compare your resources used/requested
#################
Which confirms it was a memory issue.
Why I was able to run it with 140GB and @johnnybonney was not isn't clear to me. And the fact that this need 100GB to run is astounding to me... I'll have to keep looking.
Yes, we need to focus on why this is taking so much memory. It seems excessive to me.
The specification has 9 knot points with a constant spline. So there should be 9 unique rows for each value of x. There are no more than ~11,000 unique values of x. So, 100,000 grid points.
How many terms in the MTR specification? Is that 1,500?
So then we have a matrix with 100,000 x 1,500 = 1.5e8 elements. I generated a matrix of that size with floats using randn and takes up 1.2GB according to pryr::object_size(). Suppose we need 4 of these matrices for Gurobi to impose the shape constraints. We are still under 5GB.
@a-torgovitsky sorry for not confirming the calculations above earlier. Those are exactly right.
The reason for the enormous memory requirement is that I constructed the base matrices, and then removed the duplicate rows. To be memory efficient, I should have been removing duplicates as I was constructing the base matrices.
For instance, the example above has initgrid.nu = audit.nu = 200
.
But there is only one spline (albeit interacted with many different variables), and that spline has 8 segments.
So we only really need 8 of the 200 u
points in the grid, meaning 96% of the base matrix was not necessary.
(This still doesn't explain the server being incapable of carrying out the estimation, as we're nowhere near using 100GB of memory---but that is a separate issue.)
I'll rewrite the code that generates the base matrix so that duplicates are taken care of along the way. In case someone is interested in how I plan to do this, or knows of a way to do this more efficiently:
x
, whether a row in the constraint grid is unique depends on the realizations of the u
terms (polynomials of u
, splines, etc.)
So I'll first construct the portion of the base matrix that pertains exclusively to u
, and then I'll remove all duplicates there.
This means I'll be undoing the interactions between x
and u
terms.x
terms.x
and u
terms declared by the user.x
and u
terms declared by the user, more duplicates may arise from the previous step (e.g. as in the specification in #166, here).
So I will remove these new duplicates.Note that I'm constructing blocks of matrices, and then combining them together, and then removing duplicates.
I'm doing it this way because I can exploit R's functions that can quickly construct design matrices, as well as the splines2::bSpline()
function that can quickly construct design matrices for splines.
I also want to avoid loops (i.e. construct the matrix row-wise), seeing how large some of these initial grids are.
I have been unable to think of a way that is perfectly efficient, i.e. I cannot determine every unique row before constructing the matrix, and then construct only those unique rows. But if someone can, do let me know.
Got it. Could you just make sure to benchmark the time taken in the current approach vs the new approach? It's not clear (to me; I imagine computer scientists know) whether it is more efficient to sort (remove duplicates) from a single large matrix or sequentially from smaller matrices, and if so what the speed difference will be. So, let's just make sure we don't take a huge performance hit with this change.
One thing that might help is to focus on smaller matrices that have the same x value but the entire u grid. Just because the u grid is what is creating potential duplicates.
Could you just make sure to benchmark the time taken in the current approach vs the new approach?
Sure thing.
One thing that might help is to focus on smaller matrices that have the same x value but the entire u grid. Just because the u grid is what is creating potential duplicates.
Yep, that's the plan.
I implemented the procedure above in a slightly different way, but it looks to be working.
The runtime is also much shorter, the reason being that we immediately begin working with a much smaller u
grid.
For instance, for a subsample of 1,000 observations, the original code would take almost 3 minutes to construct the constraint matrix.
Having now removed redundant u
s earlier on, it takes less than 20 seconds.
Memory use is also much better.
As mentioned before, duplicates can arise after interacting the x
and u
terms.
With the new code, after combining the x
and u
grids, there are 3,600 redundant rows that must be removed.
But this is only 3.5% of the constraint matrix, whereas we previously were removing 96% of the constraint matrix.
(A note on what these duplicates are below.)
I can now run the Gelbach example on my machine, and get the same results as before. It takes about 11--13 minutes from beginning to end. (This is surprising since it took the server about the same amount of time to solve just the minimization problem...)
The duplicates mentioned above arise because of the endpoint in the u
grid, where u = 1
.
For a spline of degree 0, all basis splines evaluate to 0 at u = 1
.
The reason is that the intervals used to define the basis splines are of the form [lb, ub)
, i.e. they are open on the upper bound.
To my understanding, this isn't problematic so long as there is an intercept term in the MTR.
That is, the u = 1
becomes like a 'baseline value', similar to how dummy variables that are dropped become baseline values.
However, if the user declares something like:
m0 = ~ 0 + uSplines(degree = 0, knots = spline_knots),
m1 = ~ 0 + uSplines(degree = 0, knots = spline_knots),
m0.lb = 1,
m1.lb = 1,
then they'll run into problems related to infeasibility (which we have an error message for).
i.e. at u = 1
, the MTRs are necessarily 0
, violating the lower bound constraints.
I just wanted to mention this in case I was overlooking a potential problem.
Ok, that all sounds great, but I'm not sure I understand the last point.
Is that because the audit/constraint grid is set to always include u = 1
in addition to the other points chosen via Halton sequences? Is that a deliberate design choice we made at some point?
Is that because the audit/constraint grid is set to always include u = 1 in addition to the other points chosen via Halton sequences? Is that a deliberate design choice we made at some point?
Yep to both.
Maybe it makes less sense with constant splines.
But if u
enters as a polynomial, or some higher-order spline, then I believe we would want to include 0 and 1 in the u
-grid to try ensure the shape constraints hold for the whole [0, 1]
interval.
Yeah I agree, it makes sense
We seem okay on this, so I'll close this.
I am having an issue where R crashes after the bounds are obtained by Gurobi (seen via
debug = T
). The last output on the log file is the solution to the upper bound problem - I do not make it far enough to seeBounds on target parameter...
fromivmte
. Specifically, the last thing I see isI assume that after the bounds are obtained, the audit procedure is checking whether the shape constraints are satisfied, and then that would possibly be what is causing the crash. I do hesitate to think it is a memory issue, since I successfully ran the exact same specification with a different outcome (which was binary, while the problematic outcome is continuous).
That said, I did try increasing the RAM on the server, all the way up to 140GB, but R is still crashing at this point with this specification. Perhaps I need to crank up the RAM even further, but that seems odd to me, so maybe you can shed a little light on what is going on here. Here are the replication files for the problem.