jump-dev / SCS.jl

A Julia interface for the SCS conic programming solver
https://github.com/cvxgrp/scs
Other
81 stars 27 forks source link

add support for sparse-direct-mkl-pardiso solver #255

Closed kalmarek closed 1 year ago

kalmarek commented 2 years ago

this is a proof of concept, depends on https://github.com/JuliaPackaging/Yggdrasil/pull/4773 but works locally ;)

Besides MKLDirectSolver I've added runtime scs_version for each solver library; @odow technically it's a breaking change (there is argumentless version anymore), but it was just internal function that we didn't even test. So maybe we should ask do we actually need to query for version at runtime?

odow commented 2 years ago

I'm not a huge fan of adding this to SCS.jl. It seems like quite a heavy dependency. Can we use Requires similar to the GPU?

kalmarek commented 2 years ago

I was thinking about it as well, but it pulls just two additional jlls: MKL_jll and IntelOpenMP_jll. Disk-size though it weights ~ 700MB whereas julia is ~500MB.

But yeah, probably you're right, I didn't take into account the downloaded libs, only loadtime ;)

That's julia with JULIA_DEPOT_PATH=/tmp:

(@v1.7) pkg> add SCS
  Installing known registries into `/tmp/julia_tmp`
    Updating registry at `/tmp/julia_tmp/registries/General.toml`
   Resolving package versions...
   Installed Bzip2_jll ────────── v1.0.8+0
   Installed Preferences ──────── v1.2.5
   Installed SCS ──────────────── v1.1.1
   Installed JSON ─────────────── v0.21.3
   Installed CodecBzip2 ───────── v0.7.2
   Installed Parsers ──────────── v2.2.4
   Installed MutableArithmetics ─ v1.0.0
   Installed BenchmarkTools ───── v1.3.1
   Installed SCS_GPU_jll ──────── v3.2.0+0
   Installed SCS_jll ──────────── v3.2.0+0
   Installed OpenBLAS32_jll ───── v0.3.17+0
   Installed CodecZlib ────────── v0.7.0
   Installed Requires ─────────── v1.3.0
   Installed OrderedCollections ─ v1.4.1
   Installed TranscodingStreams ─ v0.9.6
   Installed JLLWrappers ──────── v1.4.1
   Installed MathOptInterface ─── v1.1.2
  Downloaded artifact: Bzip2
  Downloaded artifact: SCS_GPU
  Downloaded artifact: OpenBLAS32
  Downloaded artifact: SCS
    Updating `/tmp/julia_tmp/environments/v1.7/Project.toml`
  [c946c3f1] + SCS v1.1.1
    Updating `/tmp/julia_tmp/environments/v1.7/Manifest.toml`
  [6e4b80f9] + BenchmarkTools v1.3.1
  [523fee87] + CodecBzip2 v0.7.2
  [944b1d66] + CodecZlib v0.7.0
  [692b3bcd] + JLLWrappers v1.4.1
  [682c06a0] + JSON v0.21.3
  [b8f27783] + MathOptInterface v1.1.2
  [d8a4904e] + MutableArithmetics v1.0.0
  [bac558e1] + OrderedCollections v1.4.1
  [69de0a69] + Parsers v2.2.4
  [21216c6a] + Preferences v1.2.5
  [ae029012] + Requires v1.3.0
  [c946c3f1] + SCS v1.1.1
  [3bb67fe8] + TranscodingStreams v0.9.6
  [6e34b625] + Bzip2_jll v1.0.8+0
  [656ef2d0] + OpenBLAS32_jll v0.3.17+0
  [af6e375f] + SCS_GPU_jll v3.2.0+0
  [f4f2fc5b] + SCS_jll v3.2.0+0
  [0dad84c5] + ArgTools
  [56f22d72] + Artifacts
  [2a0f44e3] + Base64
  [ade2ca70] + Dates
  [f43a241f] + Downloads
  [b77e0a4c] + InteractiveUtils
  [b27032c2] + LibCURL
  [76f85450] + LibGit2
  [8f399da3] + Libdl
  [37e2e46d] + LinearAlgebra
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [a63ad114] + Mmap
  [ca575930] + NetworkOptions
  [44cfe95a] + Pkg
  [de0858da] + Printf
  [9abbd945] + Profile
  [3fa0cd96] + REPL
  [9a3f8284] + Random
  [ea8e919c] + SHA
  [9e88b42a] + Serialization
  [6462fe0b] + Sockets
  [2f01184e] + SparseArrays
  [10745b16] + Statistics
  [fa267f1f] + TOML
  [a4e569a6] + Tar
  [8dfed614] + Test
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
  [e66e0078] + CompilerSupportLibraries_jll
  [deac9b47] + LibCURL_jll
  [29816b5a] + LibSSH2_jll
  [c8ffd9c3] + MbedTLS_jll
  [14a3606d] + MozillaCACerts_jll
  [4536629a] + OpenBLAS_jll
  [83775a58] + Zlib_jll
  [8e850b90] + libblastrampoline_jll
  [8e850ede] + nghttp2_jll
  [3f19e933] + p7zip_jll
Precompiling project...
  23 dependencies successfully precompiled in 36 seconds

(@v1.7) pkg> add MKL_jll
   Resolving package versions...
   Installed MKL_jll ───────── v2022.0.0+0
   Installed IntelOpenMP_jll ─ v2018.0.3+2
  Downloaded artifact: IntelOpenMP
    Updating `/tmp/julia_tmp/environments/v1.7/Project.toml`
  [856f044c] + MKL_jll v2022.0.0+0
    Updating `/tmp/julia_tmp/environments/v1.7/Manifest.toml`
  [1d5cc7b8] + IntelOpenMP_jll v2018.0.3+2
  [856f044c] + MKL_jll v2022.0.0+0
  [4af54fe1] + LazyArtifacts
Precompiling project...
  2 dependencies successfully precompiled in 1 seconds (23 already precompiled)
odow commented 2 years ago

Disk-size though it weights ~ 700MB

Yes, this is what I meant by heavy.

kalmarek commented 2 years ago

That's probably a problem with 32/64-bit interface on x86 linux:


------------------------------------------------------------------
           SCS v3.2.1 - Splitting Conic Solver
    (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 1, constraints m: 1
cones:    z: primal zero / dual free vars: 1
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
      alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
      max_iters: 100000, normalize: 1, rho_x: 1.00e-06
      acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-direct-mkl-pardiso
      nnz(A): 1, nnz(P): 0
Error during symbolic factorization: -12Error during MKL Pardiso cleanup: -12ERROR: init_lin_sys_work failure
Test Failed at /home/runner/work/SCS.jl/SCS.jl/test/test_problems.jl:714
  Expression: solution.ret_val == 1
   Evaluated: -4 == 1
ERROR: LoadError: There was an error during testing
in expression starting at /home/runner/work/SCS.jl/SCS.jl/test/runtests.jl:22

ERROR: missing ScsWork, ScsSolution or ScsInfo input
ERROR: Package SCS errored during testing
kalmarek commented 1 year ago

Numerically DirectSolver and MKLDirectSolver behave the same, i.e. the times below are for the same number (20_000) of iterations.

On a small problem I get

problem:  variables n: 2640, constraints m: 5499
cones:    z: primal zero / dual free vars: 2860
          s: psd vars: 2639, ssize: 10
settings: eps_abs: 1.0e-10, eps_rel: 1.0e-10, eps_infeas: 1.0e-07
          alpha: 1.90, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 20000, normalize: 1, rho_x: 1.00e-06
          acceleration_lookback: 50, acceleration_interval: 10
lin-sys:  sparse-direct-amd-qdldl
          nnz(A): 66208, nnz(P): 0

On a larger one

problem:  variables n: 5708, constraints m: 11938
cones:    z: primal zero / dual free vars: 6231
          s: psd vars: 5707, ssize: 20
settings: eps_abs: 1.0e-10, eps_rel: 1.0e-10, eps_infeas: 1.0e-07
          alpha: 1.90, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 20000, normalize: 1, rho_x: 1.00e-06
          acceleration_lookback: 50, acceleration_interval: 10
lin-sys:  sparse-direct-mkl-pardiso
          nnz(A): 275706, nnz(P): 0

the speed-up is comparable (MKLDirectSolver is 2-3 times faster here).


These might be atypical examples for showing off MKLDirectSolver: these problems are after symmetry reduction so there's a rather small number of small psd constraints with a bunch of dense linear constraints.

The original version (with large psd constraint and lots of sparse linear constraints) of the first (small) problem is:

problem:  variables n: 93962, constraints m: 169158
cones:    z: primal zero / dual free vars: 75197
          s: psd vars: 93961, ssize: 1
settings: eps_abs: 1.0e-10, eps_rel: 1.0e-10, eps_infeas: 1.0e-07
          alpha: 1.90, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 20000, normalize: 1, rho_x: 1.00e-06
          acceleration_lookback: 50, acceleration_interval: 10
lin-sys:  sparse-direct-amd-qdldl
          nnz(A): 746426, nnz(P): 0

DirectSolver runs in

so MKLDirectSolver benefits from multiple threads (OMP_NUM_THREADS) while DirectSolver doesn't. Simply by looking at htop the DirectSolver seems to waste most of the resources (the occupied cores are predominantly in sys: wait state -- maybe some problem with synchronization/communication?). MKLDirectSolver fares much better here fully utilizing all available resources.

Tbh I hoped for something much better ;) But maybe those timings/findings are useful for @bodono as well.

julia> versioninfo(verbose=true)
Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      "Arch Linux"
  uname: Linux 6.0.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 29 Oct 2022 14:08:39 +0000 x86_64 unknown
  CPU: AMD Ryzen 7 PRO 4750U with Radeon Graphics: 
                 speed         user         nice          sys         idle          irq
       #1-16  1387 MHz     311723 s       3636 s     175687 s    1813700 s          1 s
  Memory: 30.586448669433594 GB (15411.67578125 MB free)
  Uptime: 117173.85 sec
  Load Avg:  4.83  4.28  3.24
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 8 on 16 virtual cores
Environment:
  JULIA_NUM_THREADS = 8
[...]