idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.72k stars 1.04k forks source link

See how large a problem we can solve with NS using field split #24548

Closed lindsayad closed 1 year ago

lindsayad commented 1 year ago

Reason

We have HPC people who are wanting to see if they can use NS (Pronghorn) to solve a problem with 70 million cells. The mesh is complex. To start we will see how big a simple channel problem we can solve using field split.

Design

Take a 3D channel and see how big we can go.

Impact

Test what we're capable of doing in NS

lindsayad commented 1 year ago

Note-taking on a medium sized 3D channel flow problem with a Reynolds number of 1 to anticipate what FSP options to use for the big kahuna

Re dofs cpu -pc_fieldsplit_schur_fact_type -pc_fieldsplit_schur_precondition solve time
1 655360 32 full a11 50.525
1 655360 32 lower a11 39.341
1 655360 32 upper a11 37.435
1 655360 32 diag a11 66.413
1 655360 32 full selfp 47.303
1 655360 32 lower selfp 37.095
1 655360 32 upper selfp 35.902
1 655360 32 diag selfp 61.088

Conclusions:

lindsayad commented 1 year ago

I want to investigate whether we are solving a symmetric form with respect to A01 and A10 and if not, then whether switching to a symmetric form improves convergence

lindsayad commented 1 year ago

My general experience:

Transferring this comment to https://github.com/idaholab/moose/discussions/24809. Any additional findings will go there

lindsayad commented 1 year ago

Adding a note that I had posted on slack but never added here:

I barely got within the memory limits of my workstation and was able to solve a 1.3 million cell 3D channel flow problem (5.2 million dofs) with 64 cpu in 338 seconds (90 seconds spent in setup) with field split. I will need to take to the HPC to see how much bigger I can go

grmnptr commented 1 year ago

How many nonlinear iterations did it take to solve your problem (asking to compare with SIMPLE which needs 400 to solve a problem with similar size, Re=220)?

lindsayad commented 1 year ago

I don't remember exactly. Somewhere between 3 and 6 although the 248 second solve reporting was with a problem with Reynolds ~1.

lindsayad commented 1 year ago

Testing out some things. 696.907 second solve for 7002001 dofs with 63 processes using distributed mesh and the hybrid discretization in https://github.com/idaholab/moose/pull/23986 for the lid driven problem with Re=1 and solver parameters

[Problem]
  type = NavierStokesProblem
  mass_matrix = 'mass'
  L_matrix = 'L'
  extra_tag_matrices = 'mass L'
[]

[Preconditioning]
  active = FSP
  [FSP]
    type = FSP
    topsplit = 'up'
    [up]
      splitting = 'u p'
      splitting_type  = schur
      petsc_options_iname = '-pc_fieldsplit_schur_fact_type  -pc_fieldsplit_schur_precondition -ksp_gmres_restart -ks\
p_type -ksp_pc_side -ksp_rtol'
      petsc_options_value = 'full                            self                              300                fgm\
res    right        1e-4'
    []
    [u]
      vars = 'u v'
      petsc_options = '-ksp_monitor'
      petsc_options_iname = '-pc_type -pc_hypre_type -ksp_type -ksp_rtol -ksp_gmres_restart -ksp_pc_side'
      petsc_options_value = 'hypre    boomeramg      gmres     1e-2      300                right'
    []
    [p]
      vars = 'pressure'
      petsc_options = '-pc_lsc_scale_diag -ksp_monitor -lsc_ksp_monitor'
      petsc_options_iname = '-ksp_type -ksp_gmres_restart -ksp_rtol -pc_type -ksp_pc_side -lsc_pc_type -lsc_ksp_type \
-lsc_ksp_pc_side -lsc_ksp_rtol'
      petsc_options_value = 'fgmres    300                1e-2      lsc      right        hypre        gmres         \
right            1e-1'
    []
  []
[]
lindsayad commented 1 year ago

With identical solve options with Q2Q1 taylor-hood elements with 5768003 dofs, with Re=1 and distributed mesh, the solve time is 170.904s. The appreciable difference between Q2Q1 performance and the hybrid CG-DG performance is due to the time in A block solves. hypre boomeramg was taking about 3-4 iterations for Q2Q1 and ~10 iterations with hybrid CG-DG

lindsayad commented 1 year ago

Copying from slack notes: according to @grmnptr OpenFOAM solves a 3 million dof problem in 273 seconds on a single process

lindsayad commented 1 year ago

Solving Q2Q1 with Re=1 with 3 million dofs with a single process gives a solve time of 742 seconds, so a factor of 2.7 slower

lindsayad commented 1 year ago

I do not know the setup of the OpenFOAM case

so likely we are not at an apples to apples comparison

lindsayad commented 1 year ago

Using Q2Q1 elements, I was able to solve a 70 million dof problem (70,034,582 to be exact; n=2789 elements, 2 dimensions) using 3,504 procs on sawtooth in 278.778 seconds with Re=1. Reading the title post again, the prospective users wanted to do 70 million cells, so assuming that's 3D, that would be 280 million dofs. Time to request another interactive session!

lindsayad commented 1 year ago

Going up to 280 million dofs on sawtooth I got crashes and messages possibly related to MPI. I am not too motivated at the moment to dig into that. So at the moment, my record is 70 million. I don't really see this as a MOOSE issue at the moment, so I'm closing. We can always re-open if someone wants to