Strumpack vs Cholmod - Githubissues

We can see significant benefit with the testPoisson3d example yes. For instance, compare the exact sparse direct LU solver (in single precision):

OMP_NUM_THREADS=8 ./testPoisson3d 100 --sp_disable_gpu --sp_compression none
...
#   - factor time = 27.9501
...
#   - factor memory = 8238.88 MB
...
REFINEMENT it. 0    res =      6482.84  rel.res =            1  bw.error =            1
REFINEMENT it. 1    res =    0.0019654  rel.res =   3.0317e-07  bw.error =  2.76639e-06
...
#   - solve time = 0.299797

to the solver with block low rank compression enabled:

OMP_NUM_THREADS=8 ./testPoisson3d 100 --sp_disable_gpu --sp_compression blr
...
#   - factor time = 10.2616
...
#   - factor memory = 2544.16 MB
...
#   - factor memory/nonzeros = 30.8799 % of multifrontal
...
GMRES it. 0 res =      1000.88  rel.res =            1   restart!
GMRES it. 1 res =      6.63635  rel.res =   0.00663049
GMRES it. 2 res =      1.68189  rel.res =   0.00168041
GMRES it. 3 res =     0.566236  rel.res =  0.000565736
GMRES it. 4 res =     0.194374  rel.res =  0.000194202
GMRES it. 5 res =    0.0576096  rel.res =  5.75586e-05
...
#   - solve time = 0.794273

The factorization time was reduced from 27 seconds to 10 seconds. The solve time has gone up slightly, but overall the compression enabled preconditioner is faster than the direct solver, and only requires ~30% of the memory compared to the exact solver.

For LLt you would expect a speedup of at most 2x. For BLR the speedups can be bigger for larger problems.

pghysels / STRUMPACK

Strumpack vs Cholmod #75