oxfordcontrol / COSMO.jl

COSMO: Accelerated ADMM-based solver for convex conic optimisation problems (LP, QP, SOCP, SDP, ExpCP, PowCP). Automatic chordal decomposition of sparse semidefinite programs.
https://oxfordcontrol.github.io/COSMO.jl/stable
Apache License 2.0
283 stars 41 forks source link

Unexpected memory allocation #32

Closed nrontsis closed 5 years ago

nrontsis commented 6 years ago

I am getting unexpectedly large memory allocations when running some problems of the Maros-Mescazos dataset.

For example, when solving PRIMALC2 (2076 non-zeros) @time reports:

Running QOCS:  7.148138 seconds (8.61 M allocations: 6.332 GiB, 7.95% gc time)

Or, when running QSCTAP2 (11417 non-zeros) I get:

Running QOCS:  1226.066138 seconds (93.82 M allocations: 1.013 TiB, 61.93% gc time)

Note that the memory allocation for LDL is very small (hundreds of KiB for PRIMALC2)

You can reproduce the results by running this script. Just change the for loop in order to run only the particular problem, e.g.:

for file in ["PRIMALC2"]
nrontsis commented 6 years ago

@migarstka could this be some sort of type-stability issue insider the solver?

nrontsis commented 6 years ago

Avoiding diagm(a)*b in the solver by replacing them with a.*b reduces PRIMALC2 memory consumption to:

Running QOCS:  5.079428 seconds (8.71 M allocations: 1.656 GiB, 4.03% gc time)
migarstka commented 6 years ago

Hey, yes i noticed these problems. It's worth investigating where the big bottlenecks are that create these big allocations

nrontsis commented 6 years ago

Is there any reason to use sparse vectors for x, s, nu and mu? I believe that even if the problem is sparse, the solution is not necessarily sparse. Switching to dense and avoiding diagm (see 0a74967) results in 3.210 GiB of memory allocations for QSCTAP2 thus reducing memory consumption by almost 350 times.

Still I wonder why it remains in the order of Gigabytes. Is the compiler able to optimize the solve function? There is a lot of manipulation of the ws struct, which I guess makes the job of the compiler harder.

Maybe we could define a function ADMM_steps(x, y, s, F, N) that will only perform N ADMM steps on the state. That is, write it in a manner that would help the compiler. The solve function would then initialize the problem, and sequentially call ADMM_steps and termination checks until convergence/infeasibility.

migarstka commented 6 years ago

Can this now be closed after adding the changes in the related merge request?

nrontsis commented 6 years ago

We haven't changed the secondOrder, freecone, box and sdcone projections so as to avoid unnecessary memory allocation.

Also I wasn't able to solve a linear system inplace. I tried with InPlaceOps.jl but it wasn't working, see here.

nrontsis commented 5 years ago

@migarstka should we close this due to #41 #40?

goulart-paul commented 5 years ago

I think probably it should be closed. There are still some memory bottlenecks, but probably not in the same places anymore. We can reopen issues for streamlining of specific functions or files.