libMesh / libmesh

libMesh github repository
http://libmesh.github.io
GNU Lesser General Public License v2.1
631 stars 283 forks source link

Slow AMR Example #1813

Open pbauman opened 5 years ago

pbauman commented 5 years ago

I've attached an example program (main.cpp.txt sigh) and input file (data.txt). This is a modified version originally sent by Simone Rossi and @boyceg. Even running in serial (using superlu for the linear solve via PETSc), AMR is much slower than the uniform grid case, where we limit the AMR grid h level to the same as the uniform. By a factor of about 3x. @roystgnr did profiling in parallel. I'm putting my serial performance log below where the runtime was 60s, but running with a uniform grid no AMR takes about 20s (on my Ivy Bridge workstation, opt mode in libMesh, etc.).

The key difference AFAICT between this example and the transient AMR example we have for convection diffusion is that this layer is "thicker" so we get more elements in the layer so even though the n_dofs is 4-8x lower than the uniform case, the AMR overhead is larger. And as we can see in my perf log below, the vast majority is in the error indicator.

 -----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=83.5089, Active time=62.6058                                                    |
 -----------------------------------------------------------------------------------------------------------------
| Event                              nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                               w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|-----------------------------------------------------------------------------------------------------------------|
|                                                                                                                 |
|                                                                                                                 |
| DefaultCoupling                                                                                                 |
|   operator()                       340063     1.0531      0.000003    1.0531      0.000003    1.68     1.68     |
|                                                                                                                 |
| DofMap                                                                                                          |
|   add_neighbors_to_send_list()     203        0.4441      0.002188    1.8305      0.009017    0.71     2.92     |
|   build_sparsity()                 102        0.9858      0.009665    2.2599      0.022156    1.57     3.61     |
|   create_dof_constraints()         203        0.5158      0.002541    0.8808      0.004339    0.82     1.41     |
|   distribute_dofs()                203        0.1346      0.000663    2.4496      0.012067    0.22     3.91     |
|   dof_indices()                    6433321    4.7028      0.000001    4.7028      0.000001    7.51     7.51     |
|   enforce_constraints_exactly()    303        0.0226      0.000075    0.0226      0.000075    0.04     0.04     |
|   old_dof_indices()                2927136    2.0929      0.000001    2.0929      0.000001    3.34     3.34     |
|   prepare_send_list()              305        0.0002      0.000001    0.0002      0.000001    0.00     0.00     |
|   reinit()                         203        0.4840      0.002384    0.4840      0.002384    0.77     0.77     |
|                                                                                                                 |
| EquationSystems                                                                                                 |
|   build_parallel_solution_vector() 102        0.3183      0.003121    0.7718      0.007566    0.51     1.23     |
|   build_solution_vector()          102        0.0007      0.000007    0.7725      0.007574    0.00     1.23     |
|                                                                                                                 |
| ExodusII_IO                                                                                                     |
|   write_nodal_data()               102        0.4216      0.004134    0.4216      0.004134    0.67     0.67     |
|                                                                                                                 |
| FE                                                                                                              |
|   compute_shape_functions()        3062396    3.1606      0.000001    3.1606      0.000001    5.05     5.05     |
|   init_shape_functions()           2727677    3.0296      0.000001    3.0296      0.000001    4.84     4.84     |
|   inverse_map()                    5621554    6.0588      0.000001    6.0588      0.000001    9.68     9.68     |
|                                                                                                                 |
| FEMap                                                                                                           |
|   compute_affine_map()             3062396    3.5707      0.000001    3.5707      0.000001    5.70     5.70     |
|   compute_face_map()               2000296    4.7277      0.000002    11.5808     0.000006    7.55     18.50    |
|   init_face_shape_functions()      101        0.0002      0.000002    0.0002      0.000002    0.00     0.00     |
|   init_reference_to_physical_map() 2727677    2.8163      0.000001    2.8163      0.000001    4.50     4.50     |
|                                                                                                                 |
| GenericProjector                                                                                                |
|   copy_dofs                        927612     1.2718      0.000001    8.0017      0.000009    2.03     12.78    |
|   operator()                       304        2.1025      0.006916    15.0393     0.049472    3.36     24.02    |
|   project_edges                    58591      0.0434      0.000001    0.0434      0.000001    0.07     0.07     |
|   project_interior                 58591      0.0504      0.000001    0.0504      0.000001    0.08     0.08     |
|   project_nodes                    58591      0.1687      0.000003    1.4670      0.000025    0.27     2.34     |
|   project_sides                    58591      0.0444      0.000001    0.0444      0.000001    0.07     0.07     |
|                                                                                                                 |
| JumpErrorEstimator                                                                                              |
|   estimate_error()                 101        13.6399     0.135048    49.7929     0.492999    21.79    79.53    |
|                                                                                                                 |
| Mesh                                                                                                            |
|   all_first_order()                101        0.1339      0.001326    0.1339      0.001326    0.21     0.21     |
|   contract()                       101        0.0124      0.000123    0.0234      0.000232    0.02     0.04     |
|   find_neighbors()                 305        1.7916      0.005874    1.7916      0.005874    2.86     2.86     |
|   renumber_nodes_and_elem()        307        0.0260      0.000085    0.0260      0.000085    0.04     0.04     |
|                                                                                                                 |
| MeshBase                                                                                                        |
|   prepare_for_use()                305        0.0817      0.000268    2.1416      0.007022    0.13     3.42     |
|                                                                                                                 |
| MeshOutput                                                                                                      |
|   write_equation_systems()         102        0.0005      0.000005    1.1948      0.011714    0.00     1.91     |
|                                                                                                                 |
| MeshRefinement                                                                                                  |
|   _coarsen_elements()              202        0.0236      0.000117    0.0236      0.000117    0.04     0.04     |
|   _refine_elements()               206        0.1315      0.000638    0.3503      0.001700    0.21     0.56     |
|   add_node()                       119744     0.0982      0.000001    0.0982      0.000001    0.16     0.16     |
|   make_coarsening_compatible()     286        0.4517      0.001579    0.4517      0.001579    0.72     0.72     |
|   make_flags_parallel_consistent() 202        0.0593      0.000293    0.0593      0.000293    0.09     0.09     |
|   make_refinement_compatible()     286        0.0224      0.000078    0.0224      0.000078    0.04     0.04     |
|                                                                                                                 |
| MeshTools                                                                                                       |
|   correct_node_proc_ids()          101        0.2252      0.002230    0.2254      0.002232    0.36     0.36     |
|                                                                                                                 |
| MeshTools::Generation                                                                                           |
|   build_cube()                     1          0.0003      0.000258    0.0003      0.000258    0.00     0.00     |
|                                                                                                                 |
| OldSolutionBase                                                                                                 |
|   check_old_context(c)             927612     1.9407      0.000002    4.4020      0.000005    3.10     7.03     |
|   check_old_context(c,p)           53508      0.1064      0.000002    0.2203      0.000004    0.17     0.35     |
|                                                                                                                 |
| OldSolutionValue                                                                                                |
|   Number eval_at_node()            168828     0.1425      0.000001    1.1955      0.000007    0.23     1.91     |
|   eval_at_point()                  53508      0.2484      0.000005    1.0200      0.000019    0.40     1.63     |
|   eval_old_dofs()                  927612     1.1824      0.000001    6.1572      0.000007    1.89     9.83     |
|                                                                                                                 |
| Parallel                                                                                                        |
|   allgather()                      203        0.0001      0.000001    0.0001      0.000001    0.00     0.00     |
|                                                                                                                 |
| Parallel::Request                                                                                               |
|   wait()                           101        0.0001      0.000001    0.0001      0.000001    0.00     0.00     |
|                                                                                                                 |
| Partitioner                                                                                                     |
|   set_node_processor_ids()         205        0.0864      0.000422    0.0864      0.000422    0.14     0.14     |
|   single_partition_range()         204        0.0272      0.000133    0.0272      0.000133    0.04     0.04     |
|                                                                                                                 |
| PetscLinearSolver                                                                                               |
|   solve()                          101        1.7375      0.017203    1.7375      0.017203    2.78     2.78     |
|                                                                                                                 |
| StatisticsVector                                                                                                |
|   maximum()                        101        0.0005      0.000005    0.0005      0.000005    0.00     0.00     |
|                                                                                                                 |
| System                                                                                                          |
|   assemble()                       101        0.9161      0.009070    2.4014      0.023776    1.46     3.84     |
|   project_fem_vector()             1          0.0005      0.000487    0.2208      0.220797    0.00     0.35     |
|   project_vector(FunctionBase)     1          0.0000      0.000002    0.2208      0.220800    0.00     0.35     |
|   project_vector(old,new)          303        0.7228      0.002386    16.9398     0.055907    1.15     27.06    |
|                                                                                                                 |
| TopologyMap                                                                                                     |
|   init()                           206        0.4652      0.002258    0.4652      0.002258    0.74     0.74     |
|                                                                                                                 |
| UnstructuredMesh                                                                                                |
|   copy_nodes_and_elements()        101        0.1093      0.001082    0.9234      0.009143    0.17     1.47     |
 -----------------------------------------------------------------------------------------------------------------
| Totals:                            3.232e+07  62.6058                                         100.00            |
 -----------------------------------------------------------------------------------------------------------------

data.txt main.cpp.txt

pbauman commented 5 years ago

Capturing comments by @roystgnr below from an email thread (note he ran in parallel so some of the issues he points out apply to parallel case):

Things that stand out (note that these are often overlapping), ordered from easy to hard to improve:

1.5% in repartitioning-related stuff is just me being an idiot; that needs to short-circuit in serial. Boris brought up a couple places with similar issues recently.

5.5% in distribute_dofs(), and 5.5% for the combination of dof_indices() and old_dof_indices(), and much of that can probably be shaved off soon with the optimizations I was discussing with Derek, and doing so will disproportionately benefit AMR.

13% within JumpErrorEstimator::estimate_error() that isn't in a subroutine profile is astonishingly high, and I've never tried optimizing that, so there may be easy opportunities I haven't seen.

10% in project_vector(old,new) and 10% in prepare_for_use(); those are still high priority optimization targets but I don't know of any low hanging fruit.

8% in Mesh::find_neighbors()... I think I can actually get rid of most of that, but only with a major redesign. Likewise with the 2% in TopologyMap::init().

13% in inverse_map(), 15% in compute_face_map() - the "map on one element inverse map on the other" process for matching up jump evaluations and projection integration might be replaceable by a more direct calculation, but not easily.

roystgnr commented 5 years ago

My run was actually on one core too! https://ark.intel.com/products/81709/Intel-Xeon-Processor-E5-2670-v3-30M-Cache-2_30-GHz is a beast.

pbauman commented 5 years ago

My run was actually on one core too!

Oh, oops!

https://ark.intel.com/products/81709/Intel-Xeon-Processor-E5-2670-v3-30M-Cache-2_30-GHz

Oh damn, nice!