xiaoyeli / superlu_dist

Distributed memory, MPI based SuperLU
https://portal.nersc.gov/project/sparse/superlu/
Other
188 stars 65 forks source link

Is SuperLu_Dist OpenMP threading performance better than SuperLU_MT? #64

Open yuao-rgb opened 4 years ago

yuao-rgb commented 4 years ago

I am applying SuperLU_MT  (version 3.1) which is supported by Amesos2 to calculate preconditioner. It is found that the SuperLU_MT has great parallel performance in LU decomposition. However, when I use the inversed LU matrix as the preconditioner for solving matrix equation (e.g. Trilinos belos as gmres solver), the parallel efficiency is very low.   In Release Notes of SuperLu_Dist, I found some notes about threading performance Improvement. Thus, I am trying to replace SuperLU_MT with SuperLu_Dist.   Would you please give us some suggestions or let us know is SuperLu_Dist(ver 6.3.1) OpenMP parallelling implementation different with SuperLU_MT(ver 3.1)? Could I use SuperLu_Dist by disabling MPI to make it works as SuperLU_MT?

xiaoyeli commented 4 years ago

For SuperLU_MT, unfortunately, the triangular solve is not parallelized. Since our recent efforts have focused on SuperLU_DIST, we don't have plan to do parallel triangular solve in _MT.

Yes, you can use _DIST on 1 node, with OpenMP threading. It works quite well.

One caveat: _MT use partial pivoting, _DIST uses static pivoting. Numerically _MT is more stable. Unless your systems are really ill-conditioned, you should not notice much difference.