Issue with the efficiency of Distributed mesh with mesh adaptivity

fdkong commented 3 years ago

Discussed in https://github.com/idaholab/moose/discussions/18163

^{Originally posted by **simopier** June 24, 2021} When performing large 3D phase field simulations, the computational cost can quickly become prohibitive. For that reason, I have been trying to leverage some of the features offered by MOOSE to decrease these costs. Two of them being the use of Distributed mesh over Replicated mesh, and the use of mesh adaptivity. However, I do not get good performances when I try to combined the two, especially when my phase field simulation uses elasticity. After discussing this with @roystgnr, @amjokisaari, and @jiangwen84, I figured I would provide examples of the types of system I am trying to simulate, and document the performance issues I was observing. I am performing phase field simulation with elasticity, and below are the different combinations of options that I have used: 1. Using Distributed mesh or Replicated mesh 2. Using mesh adaptivity (2 levels) or not. 3. When using Distributed mesh, I also tried using the `part_package = ptscotch` option. I have recorded the active time for each of these 6 simulations using the postprocessor ``` [./activetime] type = PerfGraphData data_type = TOTAL section_name = Root [../] ``` The results are provided in the table below. I made sure to perform large enough simulations to have a significant amount of nonlinear DOFs (> 2M) for Distributed mesh to be relevant. Note that when using mesh adaptivity (MA), I simulated a larger domain to still have a large amount of DOFs. Results show that without mesh adaptivity (left figure), the active time does decrease when using the distributed mesh rather than the regular mesh. However, the trend is reversed when using mesh adaptivity (right figure). With mesh adaptivity, using Distributed mesh is more time consuming that using Replicated mesh. I am not sure what causes this, but it limits the advantage of Distributed mesh. | Without mesh adaptivity | With mesh adaptivity (larger domain) | |-------------|-------------| | 0_El

|

| Attached with this post are the input files, and example of the .pbs file that I have been using to run these on HPC (falcon), and the output files. Note that I have run these simulations with the `-snes_view -log_view` options, as well as with ``` [./pgraph] type = PerfGraphOutput execute_on = 'initial final' # Default is "final" level = 3 # Default is 1 heaviest_branch = true # Default is false heaviest_sections = 7 # Default is 0 [] ``` to provide performance data. Let me know I need to provide more information or run additional tests. [Moose_discussion_simopier_distributed_mesh.zip](https://github.com/idaholab/moose/files/6710578/Moose_discussion_simopier_distributed_mesh.zip)

roystgnr commented 3 years ago

This ought to already be partly fixed via the new libMesh submodule update incorporating https://github.com/libMesh/libmesh/pull/2942 , and once I figure out the weird p-coarsening bug that's messing with our distributed mesh testing we should see another huge fix coming from https://github.com/libMesh/libmesh/pull/2957 . There'll still be room for further improvement after that, though, particularly since I've been focusing on the even-more-embarrassing case of slow uniform refinement and I haven't yet been looking for optimizations that would only apply to the adaptive case.

lindsayad commented 1 year ago

I ran moose/modules/navier_stokes/test/tests/finite_volume/ins/lid-driven/lid-driven.i with 5 uniform refinements to produce the below graphs which show refinement for replicated mesh about 3x faster than distributed mesh

replicated

replicated-uniform-refine

distributed

distributed-uniform-refine

idaholab / moose