idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.77k stars 1.05k forks source link

MOOSE got slower after the last libmesh update #4980

Closed jwpeterson closed 9 years ago

jwpeterson commented 9 years ago

@jasondhales reported this, and it seems to be real, based on ./run_tests -s -t. Here are some timing numbers from my workstation both after (first column) and before (second column) the latest libmesh update:

auxkernels/bounds.test....................................... [SCALED] [6.473s]   [4.457s]   
auxkernels/parsed_aux.test................................... [SCALED] [7.288s]   [5.037s]   
bcs/penalty_dirichlet_bc.test_function_penalty_dirichlet_bc.. [SCALED] [26.631s]  [20.486s]  
dampers/constant_damper.testdamper........................... [SCALED] [8.374s]   [5.140s]   
functions/generic_function_material.test..................... [SCALED] [158.576s] [115.217s] 
geomsearch/3d_penetration_locator.test....................... [SCALED] [6.929s]   [6.378s]   
geomsearch/3d_penetration_locator.3d_tet..................... [SCALED] [1.139s]   [0.982s]   
ics/constant_ic.test......................................... [SCALED] [3.930s]   [2.589s]   
ics/constant_ic.subdomain_test............................... [SCALED] [109.740s] [59.968s]  
ics/dependency.test.......................................... [SCALED] [0.272s]   [0.219s]   
indicators/analytical_indicator.test......................... [SCALED] [1.087s]   [0.637s]   
indicators/gradient_jump_indicator.test...................... [SCALED] [1.021s]   [0.646s]   
indicators/laplacian_jump_indicator.test..................... [SCALED] [7.344s]   [6.161s]   
kernels/2d_diffusion.testdirichlet........................... [SCALED] [0.195s]   [0.145s]   
kernels/2d_diffusion.testneumann............................. [SCALED] [4.043s]   [2.629s]   
kernels/adv_diff_reaction_transient.test..................... [SCALED] [12.581s]  [6.782s]   
kernels/block_kernel.test.................................... [SCALED] [73.426s]  [53.998s]  
kernels/block_kernel.testvars................................ [SCALED] [1.739s]   [1.233s]   
kernels/coupled_kernel_grad.test_coupled_kernel_grad......... [SCALED] [55.498s]  [49.967s]  
kernels/kernel_precompute.test............................... [SCALED] [2.547s]   [1.523s]   
markers/box_marker.test...................................... [SCALED] [1.251s]   [0.765s]   
markers/box_marker.adapt_test................................ [SCALED] [3.330s]   [2.245s]   
markers/combo_marker.test.................................... [SCALED] [1.401s]   [0.858s]   
markers/dont_mark.test....................................... [SCALED] [1.464s]   [0.912s]   
markers/error_fraction_marker.test........................... [SCALED] [1.111s]   [0.675s]   
markers/error_tolerance_marker.test.......................... [SCALED] [1.111s]   [0.677s]   
markers/error_tolerance_marker.adapt_test.................... [SCALED] [5.201s]   [3.250s]   
markers/q_point_marker.test.................................. [SCALED] [0.013s]   [0.013s]   
markers/value_range_marker.test.............................. [SCALED] [1.321s]   [0.810s]   
markers/value_threshold_marker.test.......................... [SCALED] [1.320s]   [0.819s]   
materials/material.adv_mat_couple_test....................... [SCALED] [1.623s]   [1.028s]   
materials/material.coupled_material_test..................... [SCALED] [11.164s]  [7.327s]   
materials/material.dg_test................................... [SCALED] [6.982s]   [5.532s]   
materials/material.adv_mat_couple_test2...................... [SCALED] [1.591s]   [1.086s]   
materials/material.three_coupled_mat_test.................... [SCALED] [11.269s]  [7.257s]    
materials/material.test...................................... [SCALED] [0.864s]   [0.551s]    
materials/types.test......................................... [SCALED] [0.174s]   [0.129s]    
mesh/centroid_partitioner.centroid_partitioner_test.......... [SCALED] [10.109s]  [5.989s]    
misc/save_in.test............................................ [SCALED] [0.285s]   [0.242s]    
postprocessors/block_nodal_pps.test.......................... [SCALED] [18.527s]  [11.930s]   
problems/custom_fe_problem.test.............................. [SCALED] [2.584s]   [1.564s]    
problems/mixed_coord.test.................................... [SCALED] [26.642s]  [17.096s]   
variables/multiblock_restricted_var.test..................... [SCALED] [73.623s]  [53.303s]   

The first column is slower pretty much across the board. Not sure what is causing it yet, need to bisect on the libmesh side...

roystgnr commented 9 years ago

That looks pretty devastating. Let me know if you need help bisecting.

jwpeterson commented 9 years ago

Bisection confirmed that the slowdown occurs due to libmesh/libmesh@3720bea, which is the commit where the improved second derivatives were merged. MOOSE is still "fast" on the commit right before that.

The thing is, most of the tests above should not actually need second derivatives to be computed (i.e. they involve only FIRST, LAGRANGE or CONSTANT, MONOMIAL finite elements on affine grids, so I'd like to find out where and why we're actually calling map_d2phi in MOOSE!

friedmud commented 9 years ago

I've tried to stomp out the second derivative computation before... but it keeps coming back. I'm interested to know what's causing it... On Thu, Apr 23, 2015 at 4:18 PM John W. Peterson notifications@github.com wrote:

Bisection confirmed that the slowdown occurs due to libMesh/libmesh@ 3720bea https://github.com/libMesh/libmesh/commit/3720bea, which is the commit where the improved second derivatives were merged. MOOSE is still "fast" on the commit right before that.

The thing is, most of the tests above should not actually need second derivatives to be computed (i.e. they involve only FIRST, LAGRANGE or CONSTANT, MONOMIAL finite elements on affine grids, so I'd like to find out where and why we're actually calling map_d2phi in MOOSE!

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4980#issuecomment-95707444.

permcody commented 9 years ago

air quotes """fast"""

permcody commented 9 years ago

Why are you always computing second derivates? Just go!

We need an RunException test where we create a first order problem and ask libMesh for second derivates. If that doesn't error out, we'll know we screwed something up.

jwpeterson commented 9 years ago

Doesn't seem like MOOSE is computing second derivatives too much... rather I slowed down compute_affine_map() "just a little bit", but "just a little bit" is not acceptable for a function called millions of times...

For this test case:

cd $MOOSE_DIR/test/tests/ics/constant_ic
../../../moose_test-opt -i subdomain_constant_ic_test.i -r 5

Before second derivatives were computed correctly:

 -----------------------------------------------------------------------------------------------------------------
| Event                              nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                               w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|-----------------------------------------------------------------------------------------------------------------|
|                                                                                                                 |
|                                                                                                                 |
| FEMap                                                                                                           |
|   compute_affine_map()             19382274   6.6113      0.000000    6.6113      0.000000    11.34    11.34    |

After second derivatives were computed correctly:

 -----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=106.535, Active time=101                                                        |
 -----------------------------------------------------------------------------------------------------------------
| Event                              nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                               w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|-----------------------------------------------------------------------------------------------------------------|
|                                                                                                                 |
|                                                                                                                 |
| FEMap                                                                                                           |
|   compute_affine_map()             19349506   47.7421     0.000002    47.7421     0.000002    47.27    47.27    |

(the slightly different numbers are due to using different versions of PETSc in two separate MOOSE builds, but you get the idea...)

Anyway, I think I can fix this fairly easily by doing a better job of skipping the extra computations for affine elements (as @roystgnr originally suggested!).

permcody commented 9 years ago

"Anyway, I think I can fix this fairly easily by not doing a better job of skipping the extra computations for affine elements"

Don't think I didn't cat that extra not in there before you edited the message... So the parameter should be something like not_skip_extra_computations? :smile:

oh! and make sure you default it to false for maximum negatives!

jwpeterson commented 9 years ago

On Apr 23, 2015, at 5:08 PM, Cody Permann notifications@github.com wrote:

"Anyway, I think I can fix this fairly easily by not doing a better job of skipping the extra computations for affine elements"

Don't think I didn't cat that extra not in there before you edited the message... So the parameter should be something like not_skip_extra_computations?

oh! and make sure you default it to false for maximum negatives!

Hahaha. Only if you add it to the "requirements" document.

jwpeterson commented 9 years ago

fixed in b54d009