hypre-space / hypre

Parallel solvers for sparse linear systems featuring multigrid methods.
https://www.llnl.gov/casc/hypre/
Other
670 stars 183 forks source link

Hypre scalability #201

Open pdml-director opened 3 years ago

pdml-director commented 3 years ago

Hello HYPRE team,

I have been using HYPRE for over 6 years now, and I am using it for my particle-in-cell simulations where the Poisson equation is solved using HYPRE. I am trying to scale up my PIC simulation so that it can use >100 up to 1000-10000 CPUs. However, i have noticed for a while that HYPRE is not scalable beyond one node. It seems like the internode communication takes up significant time. I usually use GMRES with SMG/PFMG as a preconditioner.

Any feedback would be appreciated.

Best, Ken

rfalgout commented 3 years ago

Hi @pdml-director . Going from one to multiple nodes will always cause a communication hit. For multigrid methods, that hit is higher than for many other coupled computations because of the O(N) complexity of the algorithm. The important thing is how fast/slow it is (time to solution) compared to other solvers. You should see pretty flat weak scaling performance if you continue to add nodes. You may also need to experiment with the number of tasks used per node. We've had situations in the past where it was better to not use all of the CPUs on a node, for example. There are also communication routing tricks that can improve performance such as described here:

https://www.sciencedirect.com/science/article/pii/S0743731519302321

These node-aware communication ideas are being added to new implementations of MPI. I'm not sure if this is relevant in your situation, but I thought it was worth pointing out. Also, note that PFMG scales much better than SMG with regard to memory, so if you can get away with using PFMG, it's preferable.

Hope this helps!

-Rob

pdml-director commented 3 years ago

Hi Rob, thank you for your prompt reply. Yes, I am using PFMG and have noticed that using all CPUs on a node for hypre scalability. For instance, on Comet XSEDE, 1 node has 24 cores, but using 12 cores is a bit faster than 24 cores. I am interested in a PIC-Poisson simulation, where PIC simulation can use up to 500 processors, but the Poisson solver (elliptic PDE) has been limiting the code to be scaled up with more processors since I have to limit the processors for the Poisson solver to less than a node. I'm okay for now, but it is getting to a point that I need to scale up the Poisson solver. It seems like there is not a straightforward way to be able to use (scale up) HYPRE with multiple nodes? Thank you,

-Ken

rfalgout commented 3 years ago

Hi Ken. We have run PFMG on at least hundreds of thousands of cores and thousands of nodes, so there is no issue scaling it up; it's designed to run at massive scales. When you say "scale up", are you talking strong scaling or weak scaling? Strong scaling rolls off more quickly for multigrid methods than with other computations, but it is often still the fastest way to solve a Poisson-like problem (low parallel efficiency does not equal slow time-to-solution). Could you provide us with more details on timings, problem sizes, etc. to demonstrate what you mean here? Thanks!

-Rob

pdml-director commented 3 years ago

Hi Rob, I have been considering a strong scaling. I am only using 200x500 cells for the Poisson solve but for the particles I try to launch at least 100-200 particles per cell. Is there an email address that I can consult you and your team? -Ken

rfalgout commented 3 years ago

Just email me and I'll bring others in as needed. Thanks! -Rob

Ilyusis commented 3 years ago

Hi Rob, hi Ken,

I was about to open an issue asking the same question about the strong scaling for the PFMG solver. We have the same kind of cases (256x500 cells) as Ken and we have seen that going from 1 node to 15 nodes (28 cores per node), the Poisson solver speed-up was only of about 1.5-1.8.

It is an expected result or did we implement something in the wrong way?

Thanks in advance for your help, Thomas

rfalgout commented 3 years ago

Hi Thomas,

Yes, this sounds plausible to me. These are really small amounts of computations per core, so communication latency is probably dominating. I'm not sure there is much that can be done here. Out of curiosity, is your Poisson solve isotropic? Also, what parameters do you use in PFMG (smoother type, skip option, rap type, etc.)? Thanks!

-Rob

Ilyusis commented 3 years ago

Hi Rob,

Thanks for your quick answer. My Poisson solver is anisotropic and there is periodicity in one of the two directions. For PFMG, I use the following parameters:

 HYPRE_StructPFMGSetTol(solver, 1.0e-06);
 HYPRE_StructPFMGSetMaxIter(solver,5000);
 HYPRE_StructPFMGSetRAPType(solver, 0);
 HYPRE_StructPFMGSetRelaxType(solver, 1);
 HYPRE_StructPFMGSetNumPreRelax(solver, 1);
 HYPRE_StructPFMGSetNumPostRelax(solver, 1);
 HYPRE_StructPFMGSetSkipRelax(solver, 0);

-Thomas

pdml-director commented 3 years ago

Hi Thomas and Rob,

I do have the same condition as Thomas in my PIC simulations.

We use PFMG as a preconditioner to GMRES (solverid = 31) for most of our calculations. Recently we saw that PFMG precondition to CG method (solverid = 11) might be faster in some cases (5 point stencil Poisson).

Thanks, Ken

Ken Hara Assistant Professor Aeronautics and Astronautics Stanford University https://pdml.stanford.edu/

From: Thomas Charoy notifications@github.com Reply-To: hypre-space/hypre reply@reply.github.com Date: Wednesday, January 13, 2021 at 1:09 AM To: hypre-space/hypre hypre@noreply.github.com Cc: Ken Hara kenhara@stanford.edu, Mention mention@noreply.github.com Subject: Re: [hypre-space/hypre] Hypre scalability (#201)

Hi Rob,

Thanks for your quick answer. My Poisson solver is anisotropic and there is periodicity in one of the two directions. For PFMG, I use the following parameters:

HYPRE_StructPFMGSetTol(solver, 1.0e-06);

HYPRE_StructPFMGSetMaxIter(solver,5000);

HYPRE_StructPFMGSetRAPType(solver, 0);

HYPRE_StructPFMGSetRelaxType(solver, 1);

HYPRE_StructPFMGSetNumPreRelax(solver, 1);

HYPRE_StructPFMGSetNumPostRelax(solver, 1);

HYPRE_StructPFMGSetSkipRelax(solver, 0);

-Thomas

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hypre-space/hypre/issues/201#issuecomment-759311374, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOOQ7TXG6SLPLMU5SY7YCMLSZVPKVANCNFSM4RYPBFYA.

Ilyusis commented 3 years ago

Hi Ken,

Thanks for your input. Indeed, some other solvers might be a little bit more efficient depending on the case/core nb (for small cases, we have seen that SMG was more efficient that PFMG) but overall, PFMG seems to be the most efficient, with however a speed-up capped at around x1.5-2, unfortunately.

Cheers, Thomas