Open ChessMastery opened 1 week ago
Can someone reproduce these tests on Unix please and provide factorization time?
Here is SuiteSparse Harwell-Boeing format files reader in C++ that converts unsymmetric real matrices into csr format used in SuperLU. HBnonsym_to_CSRnonsym.txt .
Did you set OPM_NUM_THREADS=2 (or bigger?)
No, I did not set this variable. As far as I have seen in Intel Profiler "Threading analysis" tool, if I do not set it, I get 8 OpenMP threads (omp_get_max_threads value) created independently of nprocs value, but only nprocs threads are really doing some work. In case I restrict the number of OpenMP threads by setting this value, the performance does not improve. Here are some tests for wang3 matrix (all solved with residual 1e-15). MKL_num_threads_local(1) is used. 1st picture shows the OpenMP threads work when I do not set OMP_NUM_THREADS and set nprocs=3. 2nd picure shows the OpenMP threads work when I set OMP_NUM_THREADS=3 and nprocs=3 with sequential BLAS linked. 3rd picture shows OpenMP threads work when I set OMP_NUM_THREADS=1 and nprocs=3 with sequential BLAS linked. Such behaviour happens not that often, but I would like to find out whether this is normal for SuperLU_mt or this is a bug due to my wrong building/configuring of SuperLU_mt library.
main.txt I've built the package on Windows 10, Intel(R) Core(TM) i5-8265U CPU, 8GB RAM with the command cmake -LAH -G "Ninja" -B build_slu -S superlu_mt -DPLAT="_OPENMP" -DBUILD_SHARED_LIBS=OFF -Denable_tests=OFF -Denable_examples=OFF. I tested it on some matrices and got a slowdown on several ones. I provide the call program in the attached file. Intel MKL BLAS/LAPACK, OpenMP, Intel OneAPI C/C++ compiler. I linked METIS ordering the same way as it's done in sequential version: https://github.com/ChessMastery/superlu_mt. raefsky4 from SuiteSparseCollection (unsymmetric real): nprocs=1 - 1 sec, nprocs=4 - 70 sec. Options: permc_spec=MMD_AT_PLUS_A, pivoting_threshold=1.0 (default for pdgssv driver). This is factorization & solution time. power197k from SuiteSparse: nprocs=1 - 25 sec, nprocs = 4 - 30 sec. Options: ordering=METIS_AT_PLUS_A, pivothing_threshold=1.0. wang3 from SuiteSparse: nprocs=1 - 1.7 sec, nprocs = 4 - 5 sec, ordering METIS_AT_PLUS_A, threshold=1.0. Is such vehaviour normal for SuperLU_mt or it is not supposed to happen?