Closed kalmarek closed 1 year ago
I'm not a huge fan of adding this to SCS.jl. It seems like quite a heavy dependency. Can we use Requires similar to the GPU?
I was thinking about it as well, but it pulls just two additional jlls: MKL_jll
and IntelOpenMP_jll
.
Disk-size though it weights ~ 700MB whereas julia is ~500MB.
But yeah, probably you're right, I didn't take into account the downloaded libs, only loadtime ;)
That's julia
with JULIA_DEPOT_PATH=/tmp
:
(@v1.7) pkg> add SCS
Installing known registries into `/tmp/julia_tmp`
Updating registry at `/tmp/julia_tmp/registries/General.toml`
Resolving package versions...
Installed Bzip2_jll ────────── v1.0.8+0
Installed Preferences ──────── v1.2.5
Installed SCS ──────────────── v1.1.1
Installed JSON ─────────────── v0.21.3
Installed CodecBzip2 ───────── v0.7.2
Installed Parsers ──────────── v2.2.4
Installed MutableArithmetics ─ v1.0.0
Installed BenchmarkTools ───── v1.3.1
Installed SCS_GPU_jll ──────── v3.2.0+0
Installed SCS_jll ──────────── v3.2.0+0
Installed OpenBLAS32_jll ───── v0.3.17+0
Installed CodecZlib ────────── v0.7.0
Installed Requires ─────────── v1.3.0
Installed OrderedCollections ─ v1.4.1
Installed TranscodingStreams ─ v0.9.6
Installed JLLWrappers ──────── v1.4.1
Installed MathOptInterface ─── v1.1.2
Downloaded artifact: Bzip2
Downloaded artifact: SCS_GPU
Downloaded artifact: OpenBLAS32
Downloaded artifact: SCS
Updating `/tmp/julia_tmp/environments/v1.7/Project.toml`
[c946c3f1] + SCS v1.1.1
Updating `/tmp/julia_tmp/environments/v1.7/Manifest.toml`
[6e4b80f9] + BenchmarkTools v1.3.1
[523fee87] + CodecBzip2 v0.7.2
[944b1d66] + CodecZlib v0.7.0
[692b3bcd] + JLLWrappers v1.4.1
[682c06a0] + JSON v0.21.3
[b8f27783] + MathOptInterface v1.1.2
[d8a4904e] + MutableArithmetics v1.0.0
[bac558e1] + OrderedCollections v1.4.1
[69de0a69] + Parsers v2.2.4
[21216c6a] + Preferences v1.2.5
[ae029012] + Requires v1.3.0
[c946c3f1] + SCS v1.1.1
[3bb67fe8] + TranscodingStreams v0.9.6
[6e34b625] + Bzip2_jll v1.0.8+0
[656ef2d0] + OpenBLAS32_jll v0.3.17+0
[af6e375f] + SCS_GPU_jll v3.2.0+0
[f4f2fc5b] + SCS_jll v3.2.0+0
[0dad84c5] + ArgTools
[56f22d72] + Artifacts
[2a0f44e3] + Base64
[ade2ca70] + Dates
[f43a241f] + Downloads
[b77e0a4c] + InteractiveUtils
[b27032c2] + LibCURL
[76f85450] + LibGit2
[8f399da3] + Libdl
[37e2e46d] + LinearAlgebra
[56ddb016] + Logging
[d6f4376e] + Markdown
[a63ad114] + Mmap
[ca575930] + NetworkOptions
[44cfe95a] + Pkg
[de0858da] + Printf
[9abbd945] + Profile
[3fa0cd96] + REPL
[9a3f8284] + Random
[ea8e919c] + SHA
[9e88b42a] + Serialization
[6462fe0b] + Sockets
[2f01184e] + SparseArrays
[10745b16] + Statistics
[fa267f1f] + TOML
[a4e569a6] + Tar
[8dfed614] + Test
[cf7118a7] + UUIDs
[4ec0a83e] + Unicode
[e66e0078] + CompilerSupportLibraries_jll
[deac9b47] + LibCURL_jll
[29816b5a] + LibSSH2_jll
[c8ffd9c3] + MbedTLS_jll
[14a3606d] + MozillaCACerts_jll
[4536629a] + OpenBLAS_jll
[83775a58] + Zlib_jll
[8e850b90] + libblastrampoline_jll
[8e850ede] + nghttp2_jll
[3f19e933] + p7zip_jll
Precompiling project...
23 dependencies successfully precompiled in 36 seconds
(@v1.7) pkg> add MKL_jll
Resolving package versions...
Installed MKL_jll ───────── v2022.0.0+0
Installed IntelOpenMP_jll ─ v2018.0.3+2
Downloaded artifact: IntelOpenMP
Updating `/tmp/julia_tmp/environments/v1.7/Project.toml`
[856f044c] + MKL_jll v2022.0.0+0
Updating `/tmp/julia_tmp/environments/v1.7/Manifest.toml`
[1d5cc7b8] + IntelOpenMP_jll v2018.0.3+2
[856f044c] + MKL_jll v2022.0.0+0
[4af54fe1] + LazyArtifacts
Precompiling project...
2 dependencies successfully precompiled in 1 seconds (23 already precompiled)
Disk-size though it weights ~ 700MB
Yes, this is what I meant by heavy.
That's probably a problem with 32/64-bit interface on x86 linux:
------------------------------------------------------------------
SCS v3.2.1 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem: variables n: 1, constraints m: 1
cones: z: primal zero / dual free vars: 1
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
max_iters: 100000, normalize: 1, rho_x: 1.00e-06
acceleration_lookback: 10, acceleration_interval: 10
lin-sys: sparse-direct-mkl-pardiso
nnz(A): 1, nnz(P): 0
Error during symbolic factorization: -12Error during MKL Pardiso cleanup: -12ERROR: init_lin_sys_work failure
Test Failed at /home/runner/work/SCS.jl/SCS.jl/test/test_problems.jl:714
Expression: solution.ret_val == 1
Evaluated: -4 == 1
ERROR: LoadError: There was an error during testing
in expression starting at /home/runner/work/SCS.jl/SCS.jl/test/runtests.jl:22
ERROR: missing ScsWork, ScsSolution or ScsInfo input
ERROR: Package SCS errored during testing
Numerically DirectSolver
and MKLDirectSolver
behave the same, i.e. the times below are for the same number (20_000
) of iterations.
On a small problem I get
problem: variables n: 2640, constraints m: 5499
cones: z: primal zero / dual free vars: 2860
s: psd vars: 2639, ssize: 10
settings: eps_abs: 1.0e-10, eps_rel: 1.0e-10, eps_infeas: 1.0e-07
alpha: 1.90, scale: 1.00e-01, adaptive_scale: 1
max_iters: 20000, normalize: 1, rho_x: 1.00e-06
acceleration_lookback: 50, acceleration_interval: 10
lin-sys: sparse-direct-amd-qdldl
nnz(A): 66208, nnz(P): 0
43.913933 seconds (179.25 k allocations: 15.281 MiB)
vs20.055237 seconds (179.25 k allocations: 15.281 MiB)
i.e. a 2.19-speed up;On a larger one
problem: variables n: 5708, constraints m: 11938
cones: z: primal zero / dual free vars: 6231
s: psd vars: 5707, ssize: 20
settings: eps_abs: 1.0e-10, eps_rel: 1.0e-10, eps_infeas: 1.0e-07
alpha: 1.90, scale: 1.00e-01, adaptive_scale: 1
max_iters: 20000, normalize: 1, rho_x: 1.00e-06
acceleration_lookback: 50, acceleration_interval: 10
lin-sys: sparse-direct-mkl-pardiso
nnz(A): 275706, nnz(P): 0
the speed-up is comparable (MKLDirectSolver
is 2-3 times faster here).
These might be atypical examples for showing off MKLDirectSolver
: these problems are after symmetry reduction so there's a rather small number of small psd constraints with a bunch of dense linear constraints.
The original version (with large psd constraint and lots of sparse linear constraints) of the first (small) problem is:
problem: variables n: 93962, constraints m: 169158
cones: z: primal zero / dual free vars: 75197
s: psd vars: 93961, ssize: 1
settings: eps_abs: 1.0e-10, eps_rel: 1.0e-10, eps_infeas: 1.0e-07
alpha: 1.90, scale: 1.00e-01, adaptive_scale: 1
max_iters: 20000, normalize: 1, rho_x: 1.00e-06
acceleration_lookback: 50, acceleration_interval: 10
lin-sys: sparse-direct-amd-qdldl
nnz(A): 746426, nnz(P): 0
DirectSolver
runs in
1296
seconds (1 thread),1452
seconds (4 threads)
vs MKLDirectSolver
1578
seconds (1 thread)1012
seconds (4 threads)so MKLDirectSolver
benefits from multiple threads (OMP_NUM_THREADS
) while DirectSolver
doesn't. Simply by looking at htop
the DirectSolver
seems to waste most of the resources (the occupied cores are predominantly in sys: wait
state -- maybe some problem with synchronization/communication?). MKLDirectSolver
fares much better here fully utilizing all available resources.
Tbh I hoped for something much better ;) But maybe those timings/findings are useful for @bodono as well.
julia> versioninfo(verbose=true)
Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
"Arch Linux"
uname: Linux 6.0.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 29 Oct 2022 14:08:39 +0000 x86_64 unknown
CPU: AMD Ryzen 7 PRO 4750U with Radeon Graphics:
speed user nice sys idle irq
#1-16 1387 MHz 311723 s 3636 s 175687 s 1813700 s 1 s
Memory: 30.586448669433594 GB (15411.67578125 MB free)
Uptime: 117173.85 sec
Load Avg: 4.83 4.28 3.24
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
Threads: 8 on 16 virtual cores
Environment:
JULIA_NUM_THREADS = 8
[...]
this is a proof of concept, depends on https://github.com/JuliaPackaging/Yggdrasil/pull/4773 but works locally ;)
Besides
MKLDirectSolver
I've added runtimescs_version
for each solver library; @odow technically it's a breaking change (there is argumentless version anymore), but it was just internal function that we didn't even test. So maybe we should ask do we actually need to query for version at runtime?