compare solve_type = NEWTON with PJFNK

lw4992 commented 9 years ago

I used /moose/test/tests/kernels/adv_diff_reaction_transient to compare NEWTON with PJFNK solve_type about computational efficiency . The console shows by moose_test-opt -i adv_diff_reaction_transient_test.i Executioner/solve_type=PJFNK

Time Step  0, time = 0
                dt = 0

Time Step  1, time = 0.2
                dt = 0.2
 0 Nonlinear |R| = 6.391592e+01
      0 Linear |R| = 6.391592e+01
      1 Linear |R| = 8.350076e-06
 1 Nonlinear |R| = 7.073971e-07
      0 Linear |R| = 7.073971e-07
      1 Linear |R| = 3.099020e-13
 2 Nonlinear |R| = 9.103467e-14
 Solve Converged!

Time Step  2, time = 0.4
                dt = 0.2
 0 Nonlinear |R| = 2.339452e+01
      0 Linear |R| = 2.339452e+01
      1 Linear |R| = 1.119106e-05
 1 Nonlinear |R| = 1.150864e-07
      0 Linear |R| = 1.150864e-07
      1 Linear |R| = 4.090835e-14
 2 Nonlinear |R| = 5.444037e-14
 Solve Converged!

Time Step  3, time = 0.6
                dt = 0.2
 0 Nonlinear |R| = 7.963440e+01
      0 Linear |R| = 7.963440e+01
      1 Linear |R| = 2.838193e-05
 1 Nonlinear |R| = 1.458947e-06
      0 Linear |R| = 1.458947e-06
      1 Linear |R| = 5.144125e-13
 2 Nonlinear |R| = 6.122575e-14
 Solve Converged!

Time Step  4, time = 0.8
                dt = 0.2
 0 Nonlinear |R| = 2.583997e+01
      0 Linear |R| = 2.583997e+01
      1 Linear |R| = 9.900848e-06
 1 Nonlinear |R| = 2.232539e-07
      0 Linear |R| = 2.232539e-07
      1 Linear |R| = 1.049252e-13
 2 Nonlinear |R| = 8.557090e-14
 Solve Converged!

Time Step  5, time = 1
                dt = 0.2
 0 Nonlinear |R| = 6.366882e+01
      0 Linear |R| = 6.366882e+01
      1 Linear |R| = 3.115191e-05
 1 Nonlinear |R| = 5.181732e-07
      0 Linear |R| = 5.181732e-07
      1 Linear |R| = 2.437415e-14
 2 Nonlinear |R| = 2.226702e-15
 Solve Converged!

 ------------------------------------------------------------------------------------------------------------
| Moose Test Performance: Alive time=0.398871, Active time=0.354004                                          |
 ------------------------------------------------------------------------------------------------------------
| Event                         nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                          w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|------------------------------------------------------------------------------------------------------------|
|                                                                                                            |
|                                                                                                            |
| Exodus                                                                                                     |
|   output()                    6          0.0059      0.000980    0.0059      0.000980    1.66     1.66     |
|                                                                                                            |
| Solve                                                                                                      |
|   ComputeResidualThread       50         0.2665      0.005330    0.2665      0.005330    75.28    75.28    |
|   computeDiracContributions() 60         0.0000      0.000001    0.0000      0.000001    0.01     0.01     |
|   compute_dampers()           10         0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   compute_jacobian()          10         0.0524      0.005237    0.0524      0.005238    14.79    14.80    |
|   compute_residual()          50         0.0031      0.000061    0.2701      0.005401    0.86     76.29    |
|   compute_user_objects()      142        0.0001      0.000001    0.0001      0.000001    0.02     0.02     |
|   residual.close3()           50         0.0002      0.000005    0.0002      0.000005    0.07     0.07     |
|   residual.close4()           50         0.0002      0.000005    0.0002      0.000005    0.07     0.07     |
|   solve()                     5          0.0256      0.005122    0.3481      0.069625    7.23     98.34    |
 ------------------------------------------------------------------------------------------------------------
| Totals:                       433        0.3540                                          100.00            |
 ------------------------------------------------------------------------------------------------------------

While, moose_test-opt -i adv_diff_reaction_transient_test.i Executioner/solve_type=NEWTON

Time Step  0, time = 0
                dt = 0

Time Step  1, time = 0.2
                dt = 0.2
 0 Nonlinear |R| = 6.391592e+01
      0 Linear |R| = 6.391592e+01
      1 Linear |R| = 3.133631e-13
 1 Nonlinear |R| = 3.244574e-13
 Solve Converged!

Time Step  2, time = 0.4
                dt = 0.2
 0 Nonlinear |R| = 2.339452e+01
      0 Linear |R| = 2.339452e+01
      1 Linear |R| = 1.119108e-13
 1 Nonlinear |R| = 1.310815e-13
 Solve Converged!

Time Step  3, time = 0.6
                dt = 0.2
 0 Nonlinear |R| = 7.963440e+01
      0 Linear |R| = 7.963440e+01
      1 Linear |R| = 3.840390e-13
 1 Nonlinear |R| = 4.189900e-13
 Solve Converged!

Time Step  4, time = 0.8
                dt = 0.2
 0 Nonlinear |R| = 2.583997e+01
      0 Linear |R| = 2.583997e+01
      1 Linear |R| = 1.286855e-13
 1 Nonlinear |R| = 1.649020e-13
 Solve Converged!

Time Step  5, time = 1
                dt = 0.2
 0 Nonlinear |R| = 6.366882e+01
      0 Linear |R| = 6.366882e+01
      1 Linear |R| = 2.942028e-13
 1 Nonlinear |R| = 3.147148e-13
 Solve Converged!

 ------------------------------------------------------------------------------------------------------------
| Moose Test Performance: Alive time=0.172566, Active time=0.132116                                          |
 ------------------------------------------------------------------------------------------------------------
| Event                         nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                          w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|------------------------------------------------------------------------------------------------------------|
|                                                                                                            |
|                                                                                                            |
| Exodus                                                                                                     |
|   output()                    6          0.0058      0.000968    0.0058      0.000968    4.40     4.40     |
|                                                                                                            |
| Solve                                                                                                      |
|   ComputeResidualThread       15         0.0798      0.005322    0.0798      0.005322    60.42    60.42    |
|   computeDiracContributions() 20         0.0000      0.000001    0.0000      0.000001    0.01     0.01     |
|   compute_dampers()           5          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   compute_jacobian()          5          0.0257      0.005145    0.0257      0.005146    19.47    19.47    |
|   compute_residual()          15         0.0010      0.000064    0.0809      0.005395    0.73     61.25    |
|   compute_user_objects()      62         0.0000      0.000000    0.0000      0.000000    0.02     0.02     |
|   residual.close3()           15         0.0001      0.000005    0.0001      0.000005    0.05     0.05     |
|   residual.close4()           15         0.0001      0.000004    0.0001      0.000004    0.05     0.05     |
|   solve()                     5          0.0196      0.003924    0.1263      0.025261    14.85    95.60    |
 ------------------------------------------------------------------------------------------------------------
| Totals:                       163        0.1321                                          100.00            |
 ------------------------------------------------------------------------------------------------------------

Compared PJFNK with NEWTON, It was observed that PJFNK is slower three times than NEWTON, since ComputeResidualThread was called 50 times in PJFNK, while 15 times in NEWTON. I have these issues:

Residuals should be updated on each linear iteration step In PJFNK, it means that there are only 20 times (5 nonlinear steps with 4 linear steps) linear steps counting as console results. Why ComputeResidualThread was called 50 times in PJFNK?
Also, In NEWTON solve type, residuals need to be updated on every nonlinear iteration, why 15 times from performance log, while 5 nonlinear advanced steps?

friedmud commented 9 years ago

One thing about this problem is that it is linear... so it should only take one nonlinear iteration. The reason why it doesn't (with JNKF) is because our defaults are setup for our normal case: solving nonliear problems using inexact Newton (where we don't fully converge the linear solver). To get better solve history out of JFNK you should change l_tol in your Executioner block to something tighter than your nl_rel_tol... maybe something like 1e-10. That way you will fully converge the linear solver each nonlinear step... so it should only take one of those to solve each timestep.

Also: Note that it's incredibly unfair to compare JFNK to NEWTON for a single equation, linear problem. JFNK really shines when you have many equations and they're highly nonlinear. Check out one of our papers here that shows the tradeoff point (for one given physics: phase field): http://www.sciencedirect.com/science/article/pii/S0021999112007243

Ok - now for your actual questions:

With JFNK residuals are computed at these times:

At the beginning of each timestep MOOSE itself does one residual evaluation that we use for various purposes (for instance, under certain circumstances we use it for convergence criteria)
Each nonlinear iteration requires a residual evaluation for convergence. Note that you get one at the "end" once convergence is reached too.
Each linear iteration requires a residual evaluation for the application of the finite-differenced Jacobian/vector product
Each nonlinear iteration also requires one Jacobian/vector product to apply "line search"... this means one more residual evaluation per nonlinear step. This is only required when the nonlinear step is finishing.

So... in the case you showed you have:

1 residual evaluation at the beginning of each timestep
3 residual evaluations per timestep for nonlinear iterations (one at the beginning of each of the two nonlinear iterations and then a final one to check for convergence)
4 residual evaluations per timestep for each of the linear iterations
2 residual evaluations per tiemstep for line search application (at the end of each of the two nonlinear iterations)

That gives 10 per timestep. 50 total.

With NEWTON you eliminate the residual evaluations for Jacobian/vector products... that includes the ones for linear iterations AND the ones for line search application. That means you only have:

1 at the beginning of the timestep
2 for nonlinear iterations (1 at the beginning of the nonlinear iteration and 1 final one to check for convergence)

That gives you 3 per timestep... 15 total.

Derek

On Sun, Jul 5, 2015 at 10:58 AM LIU Wei notifications@github.com wrote:

I used /moose/test/tests/kernels/adv_diff_reaction_transient to compare NEWTON with PJFNK solve_type about computational efficiency . The console shows by moose_test-opt -i adv_diff_reaction_transient_test.i Executioner/solve_type=PJFNK

Time Step 0, time = 0 dt = 0

Time Step 1, time = 0.2 dt = 0.2 0 Nonlinear |R| = 6.391592e+01 0 Linear |R| = 6.391592e+01 1 Linear |R| = 8.350076e-06 1 Nonlinear |R| = 7.073971e-07 0 Linear |R| = 7.073971e-07 1 Linear |R| = 3.099020e-13 2 Nonlinear |R| = 9.103467e-14 Solve Converged!

Time Step 2, time = 0.4 dt = 0.2 0 Nonlinear |R| = 2.339452e+01 0 Linear |R| = 2.339452e+01 1 Linear |R| = 1.119106e-05 1 Nonlinear |R| = 1.150864e-07 0 Linear |R| = 1.150864e-07 1 Linear |R| = 4.090835e-14 2 Nonlinear |R| = 5.444037e-14 Solve Converged!

Time Step 3, time = 0.6 dt = 0.2 0 Nonlinear |R| = 7.963440e+01 0 Linear |R| = 7.963440e+01 1 Linear |R| = 2.838193e-05 1 Nonlinear |R| = 1.458947e-06 0 Linear |R| = 1.458947e-06 1 Linear |R| = 5.144125e-13 2 Nonlinear |R| = 6.122575e-14 Solve Converged!

Time Step 4, time = 0.8 dt = 0.2 0 Nonlinear |R| = 2.583997e+01 0 Linear |R| = 2.583997e+01 1 Linear |R| = 9.900848e-06 1 Nonlinear |R| = 2.232539e-07 0 Linear |R| = 2.232539e-07 1 Linear |R| = 1.049252e-13 2 Nonlinear |R| = 8.557090e-14 Solve Converged!

Time Step 5, time = 1 dt = 0.2 0 Nonlinear |R| = 6.366882e+01 0 Linear |R| = 6.366882e+01 1 Linear |R| = 3.115191e-05 1 Nonlinear |R| = 5.181732e-07 0 Linear |R| = 5.181732e-07 1 Linear |R| = 2.437415e-14 2 Nonlinear |R| = 2.226702e-15 Solve Converged!

Moose Test Performance: Alive time=0.398871, Active time=0.354004

Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time

w/o Sub w/o Sub With Sub With Sub w/o S With S

------------------------------------------------------------------------------------------------------------

Exodus

output() 6 0.0059 0.000980 0.0059 0.000980 1.66 1.66

Solve

ComputeResidualThread 50 0.2665 0.005330 0.2665 0.005330 75.28 75.28

computeDiracContributions() 60 0.0000 0.000001 0.0000 0.000001 0.01 0.01

compute_dampers() 10 0.0000 0.000001 0.0000 0.000001 0.00 0.00

compute_jacobian() 10 0.0524 0.005237 0.0524 0.005238 14.79 14.80

compute_residual() 50 0.0031 0.000061 0.2701 0.005401 0.86 76.29

compute_user_objects() 142 0.0001 0.000001 0.0001 0.000001 0.02 0.02

residual.close3() 50 0.0002 0.000005 0.0002 0.000005 0.07 0.07

residual.close4() 50 0.0002 0.000005 0.0002 0.000005 0.07 0.07

solve() 5 0.0256 0.005122 0.3481 0.069625 7.23 98.34

Totals: 433 0.3540 100.00

While, moose_test-opt -i adv_diff_reaction_transient_test.i Executioner/solve_type=NEWTON

Time Step 0, time = 0 dt = 0

Time Step 1, time = 0.2 dt = 0.2 0 Nonlinear |R| = 6.391592e+01 0 Linear |R| = 6.391592e+01 1 Linear |R| = 3.133631e-13 1 Nonlinear |R| = 3.244574e-13 Solve Converged!

Time Step 2, time = 0.4 dt = 0.2 0 Nonlinear |R| = 2.339452e+01 0 Linear |R| = 2.339452e+01 1 Linear |R| = 1.119108e-13 1 Nonlinear |R| = 1.310815e-13 Solve Converged!

Time Step 3, time = 0.6 dt = 0.2 0 Nonlinear |R| = 7.963440e+01 0 Linear |R| = 7.963440e+01 1 Linear |R| = 3.840390e-13 1 Nonlinear |R| = 4.189900e-13 Solve Converged!

Time Step 4, time = 0.8 dt = 0.2 0 Nonlinear |R| = 2.583997e+01 0 Linear |R| = 2.583997e+01 1 Linear |R| = 1.286855e-13 1 Nonlinear |R| = 1.649020e-13 Solve Converged!

Time Step 5, time = 1 dt = 0.2 0 Nonlinear |R| = 6.366882e+01 0 Linear |R| = 6.366882e+01 1 Linear |R| = 2.942028e-13 1 Nonlinear |R| = 3.147148e-13 Solve Converged!

Moose Test Performance: Alive time=0.172566, Active time=0.132116

Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time

w/o Sub w/o Sub With Sub With Sub w/o S With S

------------------------------------------------------------------------------------------------------------

Exodus

output() 6 0.0058 0.000968 0.0058 0.000968 4.40 4.40

Solve

ComputeResidualThread 15 0.0798 0.005322 0.0798 0.005322 60.42 60.42

computeDiracContributions() 20 0.0000 0.000001 0.0000 0.000001 0.01 0.01

compute_dampers() 5 0.0000 0.000001 0.0000 0.000001 0.00 0.00

compute_jacobian() 5 0.0257 0.005145 0.0257 0.005146 19.47 19.47

compute_residual() 15 0.0010 0.000064 0.0809 0.005395 0.73 61.25

compute_user_objects() 62 0.0000 0.000000 0.0000 0.000000 0.02 0.02

residual.close3() 15 0.0001 0.000005 0.0001 0.000005 0.05 0.05

residual.close4() 15 0.0001 0.000004 0.0001 0.000004 0.05 0.05

solve() 5 0.0196 0.003924 0.1263 0.025261 14.85 95.60

Totals: 163 0.1321 100.00

Compared PJFNK with NEWTON, It was observed that PJFNK is slower three times than NEWTON, since ComputeResidualThread was called 50 times in PJFNK, while 15 times in NEWTON. I have these issues:

Residuals should be updated on each linear iteration step In PJFNK, it means that there are only 20 times (5 nonlinear steps with 4 linear steps) linear steps counting as console results. Why ComputeResidualThread was called 50 times in PJFNK?

Also, In NEWTON solve type, residuals need to be updated on every nonlinear iteration, why 15 times from performance log, while 5 nonlinear advanced steps?

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5341.

Moose Test Performance: Alive time=0.398871, Active time=0.354004
Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time
w/o Sub w/o Sub With Sub With Sub w/o S With S
------------------------------------------------------------------------------------------------------------


Exodus
output() 6 0.0059 0.000980 0.0059 0.000980 1.66 1.66

Solve
ComputeResidualThread 50 0.2665 0.005330 0.2665 0.005330 75.28 75.28
computeDiracContributions() 60 0.0000 0.000001 0.0000 0.000001 0.01 0.01
compute_dampers() 10 0.0000 0.000001 0.0000 0.000001 0.00 0.00
compute_jacobian() 10 0.0524 0.005237 0.0524 0.005238 14.79 14.80
compute_residual() 50 0.0031 0.000061 0.2701 0.005401 0.86 76.29
compute_user_objects() 142 0.0001 0.000001 0.0001 0.000001 0.02 0.02
residual.close3() 50 0.0002 0.000005 0.0002 0.000005 0.07 0.07
residual.close4() 50 0.0002 0.000005 0.0002 0.000005 0.07 0.07
solve() 5 0.0256 0.005122 0.3481 0.069625 7.23 98.34

Moose Test Performance: Alive time=0.172566, Active time=0.132116
Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time
w/o Sub w/o Sub With Sub With Sub w/o S With S
------------------------------------------------------------------------------------------------------------


Exodus
output() 6 0.0058 0.000968 0.0058 0.000968 4.40 4.40

Solve
ComputeResidualThread 15 0.0798 0.005322 0.0798 0.005322 60.42 60.42
computeDiracContributions() 20 0.0000 0.000001 0.0000 0.000001 0.01 0.01
compute_dampers() 5 0.0000 0.000001 0.0000 0.000001 0.00 0.00
compute_jacobian() 5 0.0257 0.005145 0.0257 0.005146 19.47 19.47
compute_residual() 15 0.0010 0.000064 0.0809 0.005395 0.73 61.25
compute_user_objects() 62 0.0000 0.000000 0.0000 0.000000 0.02 0.02
residual.close3() 15 0.0001 0.000005 0.0001 0.000005 0.05 0.05
residual.close4() 15 0.0001 0.000004 0.0001 0.000004 0.05 0.05
solve() 5 0.0196 0.003924 0.1263 0.025261 14.85 95.60

friedmud commented 9 years ago

Note: if you want to see the effect of the the Jacobian/vector product causing an extra residual evaluation you can use Executioner/line_search=none to to turn off line search... and you should get 40 linear iterations for your first case (you eliminate 2 per timestep).

Derek

On Sun, Jul 5, 2015 at 2:10 PM Derek Gaston friedmud@gmail.com wrote:

One thing about this problem is that it is linear... so it should only take one nonlinear iteration. The reason why it doesn't (with JNKF) is because our defaults are setup for our normal case: solving nonliear problems using inexact Newton (where we don't fully converge the linear solver). To get better solve history out of JFNK you should change l_tol in your Executioner block to something tighter than your nl_rel_tol... maybe something like 1e-10. That way you will fully converge the linear solver each nonlinear step... so it should only take one of those to solve each timestep.

Also: Note that it's incredibly unfair to compare JFNK to NEWTON for a single equation, linear problem. JFNK really shines when you have many equations and they're highly nonlinear. Check out one of our papers here that shows the tradeoff point (for one given physics: phase field): http://www.sciencedirect.com/science/article/pii/S0021999112007243

Ok - now for your actual questions:

With JFNK residuals are computed at these times:

At the beginning of each timestep MOOSE itself does one residual evaluation that we use for various purposes (for instance, under certain circumstances we use it for convergence criteria)

Each nonlinear iteration requires a residual evaluation for convergence. Note that you get one at the "end" once convergence is reached too.

Each linear iteration requires a residual evaluation for the application of the finite-differenced Jacobian/vector product

Each nonlinear iteration also requires one Jacobian/vector product to apply "line search"... this means one more residual evaluation per nonlinear step. This is only required when the nonlinear step is finishing.

So... in the case you showed you have:

1 residual evaluation at the beginning of each timestep

3 residual evaluations per timestep for nonlinear iterations (one at the beginning of each of the two nonlinear iterations and then a final one to check for convergence)

4 residual evaluations per timestep for each of the linear iterations

2 residual evaluations per tiemstep for line search application (at the end of each of the two nonlinear iterations)

That gives 10 per timestep. 50 total.

With NEWTON you eliminate the residual evaluations for Jacobian/vector products... that includes the ones for linear iterations AND the ones for line search application. That means you only have:

1 at the beginning of the timestep

2 for nonlinear iterations (1 at the beginning of the nonlinear iteration and 1 final one to check for convergence)

That gives you 3 per timestep... 15 total.

Derek

On Sun, Jul 5, 2015 at 10:58 AM LIU Wei notifications@github.com wrote:

I used /moose/test/tests/kernels/adv_diff_reaction_transient to compare NEWTON with PJFNK solve_type about computational efficiency . The console shows by moose_test-opt -i adv_diff_reaction_transient_test.i Executioner/solve_type=PJFNK

Time Step 0, time = 0 dt = 0

Time Step 1, time = 0.2 dt = 0.2 0 Nonlinear |R| = 6.391592e+01 0 Linear |R| = 6.391592e+01 1 Linear |R| = 8.350076e-06 1 Nonlinear |R| = 7.073971e-07 0 Linear |R| = 7.073971e-07 1 Linear |R| = 3.099020e-13 2 Nonlinear |R| = 9.103467e-14 Solve Converged!

Time Step 2, time = 0.4 dt = 0.2 0 Nonlinear |R| = 2.339452e+01 0 Linear |R| = 2.339452e+01 1 Linear |R| = 1.119106e-05 1 Nonlinear |R| = 1.150864e-07 0 Linear |R| = 1.150864e-07 1 Linear |R| = 4.090835e-14 2 Nonlinear |R| = 5.444037e-14 Solve Converged!

Time Step 3, time = 0.6 dt = 0.2 0 Nonlinear |R| = 7.963440e+01 0 Linear |R| = 7.963440e+01 1 Linear |R| = 2.838193e-05 1 Nonlinear |R| = 1.458947e-06 0 Linear |R| = 1.458947e-06 1 Linear |R| = 5.144125e-13 2 Nonlinear |R| = 6.122575e-14 Solve Converged!

Time Step 4, time = 0.8 dt = 0.2 0 Nonlinear |R| = 2.583997e+01 0 Linear |R| = 2.583997e+01 1 Linear |R| = 9.900848e-06 1 Nonlinear |R| = 2.232539e-07 0 Linear |R| = 2.232539e-07 1 Linear |R| = 1.049252e-13 2 Nonlinear |R| = 8.557090e-14 Solve Converged!

Time Step 5, time = 1 dt = 0.2 0 Nonlinear |R| = 6.366882e+01 0 Linear |R| = 6.366882e+01 1 Linear |R| = 3.115191e-05 1 Nonlinear |R| = 5.181732e-07 0 Linear |R| = 5.181732e-07 1 Linear |R| = 2.437415e-14 2 Nonlinear |R| = 2.226702e-15 Solve Converged!

Moose Test Performance: Alive time=0.398871, Active time=0.354004

Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time

w/o Sub w/o Sub With Sub With Sub w/o S With S

------------------------------------------------------------------------------------------------------------

Exodus

output() 6 0.0059 0.000980 0.0059 0.000980 1.66 1.66

Solve

ComputeResidualThread 50 0.2665 0.005330 0.2665 0.005330 75.28 75.28

computeDiracContributions() 60 0.0000 0.000001 0.0000 0.000001 0.01 0.01

compute_dampers() 10 0.0000 0.000001 0.0000 0.000001 0.00 0.00

compute_jacobian() 10 0.0524 0.005237 0.0524 0.005238 14.79 14.80

compute_residual() 50 0.0031 0.000061 0.2701 0.005401 0.86 76.29

compute_user_objects() 142 0.0001 0.000001 0.0001 0.000001 0.02 0.02

residual.close3() 50 0.0002 0.000005 0.0002 0.000005 0.07 0.07

residual.close4() 50 0.0002 0.000005 0.0002 0.000005 0.07 0.07

solve() 5 0.0256 0.005122 0.3481 0.069625 7.23 98.34

Totals: 433 0.3540 100.00

While, moose_test-opt -i adv_diff_reaction_transient_test.i Executioner/solve_type=NEWTON

Time Step 0, time = 0 dt = 0

Time Step 1, time = 0.2 dt = 0.2 0 Nonlinear |R| = 6.391592e+01 0 Linear |R| = 6.391592e+01 1 Linear |R| = 3.133631e-13 1 Nonlinear |R| = 3.244574e-13 Solve Converged!

Time Step 2, time = 0.4 dt = 0.2 0 Nonlinear |R| = 2.339452e+01 0 Linear |R| = 2.339452e+01 1 Linear |R| = 1.119108e-13 1 Nonlinear |R| = 1.310815e-13 Solve Converged!

Time Step 3, time = 0.6 dt = 0.2 0 Nonlinear |R| = 7.963440e+01 0 Linear |R| = 7.963440e+01 1 Linear |R| = 3.840390e-13 1 Nonlinear |R| = 4.189900e-13 Solve Converged!

Time Step 4, time = 0.8 dt = 0.2 0 Nonlinear |R| = 2.583997e+01 0 Linear |R| = 2.583997e+01 1 Linear |R| = 1.286855e-13 1 Nonlinear |R| = 1.649020e-13 Solve Converged!

Time Step 5, time = 1 dt = 0.2 0 Nonlinear |R| = 6.366882e+01 0 Linear |R| = 6.366882e+01 1 Linear |R| = 2.942028e-13 1 Nonlinear |R| = 3.147148e-13 Solve Converged!

Moose Test Performance: Alive time=0.172566, Active time=0.132116

Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time

w/o Sub w/o Sub With Sub With Sub w/o S With S

------------------------------------------------------------------------------------------------------------

Exodus

output() 6 0.0058 0.000968 0.0058 0.000968 4.40 4.40

Solve

ComputeResidualThread 15 0.0798 0.005322 0.0798 0.005322 60.42 60.42

computeDiracContributions() 20 0.0000 0.000001 0.0000 0.000001 0.01 0.01

compute_dampers() 5 0.0000 0.000001 0.0000 0.000001 0.00 0.00

compute_jacobian() 5 0.0257 0.005145 0.0257 0.005146 19.47 19.47

compute_residual() 15 0.0010 0.000064 0.0809 0.005395 0.73 61.25

compute_user_objects() 62 0.0000 0.000000 0.0000 0.000000 0.02 0.02

residual.close3() 15 0.0001 0.000005 0.0001 0.000005 0.05 0.05

residual.close4() 15 0.0001 0.000004 0.0001 0.000004 0.05 0.05

solve() 5 0.0196 0.003924 0.1263 0.025261 14.85 95.60

Totals: 163 0.1321 100.00

Compared PJFNK with NEWTON, It was observed that PJFNK is slower three times than NEWTON, since ComputeResidualThread was called 50 times in PJFNK, while 15 times in NEWTON. I have these issues:

Residuals should be updated on each linear iteration step In PJFNK, it means that there are only 20 times (5 nonlinear steps with 4 linear steps) linear steps counting as console results. Why ComputeResidualThread was called 50 times in PJFNK?

Also, In NEWTON solve type, residuals need to be updated on every nonlinear iteration, why 15 times from performance log, while 5 nonlinear advanced steps?

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/5341.

Moose Test Performance: Alive time=0.398871, Active time=0.354004
Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time
w/o Sub w/o Sub With Sub With Sub w/o S With S
------------------------------------------------------------------------------------------------------------


Exodus
output() 6 0.0059 0.000980 0.0059 0.000980 1.66 1.66

Solve
ComputeResidualThread 50 0.2665 0.005330 0.2665 0.005330 75.28 75.28
computeDiracContributions() 60 0.0000 0.000001 0.0000 0.000001 0.01 0.01
compute_dampers() 10 0.0000 0.000001 0.0000 0.000001 0.00 0.00
compute_jacobian() 10 0.0524 0.005237 0.0524 0.005238 14.79 14.80
compute_residual() 50 0.0031 0.000061 0.2701 0.005401 0.86 76.29
compute_user_objects() 142 0.0001 0.000001 0.0001 0.000001 0.02 0.02
residual.close3() 50 0.0002 0.000005 0.0002 0.000005 0.07 0.07
residual.close4() 50 0.0002 0.000005 0.0002 0.000005 0.07 0.07
solve() 5 0.0256 0.005122 0.3481 0.069625 7.23 98.34

Moose Test Performance: Alive time=0.172566, Active time=0.132116
Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time
w/o Sub w/o Sub With Sub With Sub w/o S With S
------------------------------------------------------------------------------------------------------------


Exodus
output() 6 0.0058 0.000968 0.0058 0.000968 4.40 4.40

Solve
ComputeResidualThread 15 0.0798 0.005322 0.0798 0.005322 60.42 60.42
computeDiracContributions() 20 0.0000 0.000001 0.0000 0.000001 0.01 0.01
compute_dampers() 5 0.0000 0.000001 0.0000 0.000001 0.00 0.00
compute_jacobian() 5 0.0257 0.005145 0.0257 0.005146 19.47 19.47
compute_residual() 15 0.0010 0.000064 0.0809 0.005395 0.73 61.25
compute_user_objects() 62 0.0000 0.000000 0.0000 0.000000 0.02 0.02
residual.close3() 15 0.0001 0.000005 0.0001 0.000005 0.05 0.05
residual.close4() 15 0.0001 0.000004 0.0001 0.000004 0.05 0.05
solve() 5 0.0196 0.003924 0.1263 0.025261 14.85 95.60

lw4992 commented 9 years ago

Thanks @friedmud for the patient explaination. Another questions about computation effiency:

At the beginning of timestep, one ComputeResidual was called to compute nonlinear residual, also, another one ComputeResidual was called to check nonlinear convergence at the timestep end. Does it mean that ComputeResidual was called 3 times, but only one time to used to updated resiudal for algebraic solver? if so, is it possible to custom more cheaper convergence criteria, since ComputeResidual is so cost? I found that there are two virtual fuctions in FEProblem.C, checkNonlinearConvergence and checkLinearConvergence, did they use to control the linear and nonlinear iteration convergence?
Material class inherited from SetupInterface, but the validParameter of Material did not inherited the parameter of SetupInterface

template<>
InputParameters validParams<Material>()
{
  InputParameters params = validParams<MooseObject>();
  params += validParams<BlockRestrictable>();
  params += validParams<BoundaryRestrictable>();

  params.addParam<bool>("use_displaced_mesh", false, "Whether or not this object should use the displaced mesh for computation.  Note that in the case this is true but no displacements are provided in the Mesh block the undisplaced mesh will still be used.");

  // Outputs
  params += validParams<OutputInterface>();
  params.set<std::vector<OutputName> >("outputs") =  std::vector<OutputName>(1, "none");
  params.addParam<std::vector<std::string> >("output_properties", "List of material properties, from this material, to output (outputs must also be defined to an output type)");

  params.addParamNamesToGroup("outputs output_properties", "Outputs");
  params.addParamNamesToGroup("use_displaced_mesh", "Advanced");
  params.registerBase("Material");

  return params;
}

It means that Material::computeProperties would be executed on every linear iteration as the default execute_on of SetupInterface. As moose style, one kernel one variable, so, material class could usally be used to store and update coupled information, then, every variable kernel get proper information from Material by getMaterialProperty function to update computeResidual. The question is : In NEWTON solver_type, Kernel::ComputeResidual should be called only on nonlinear step, however, Material::computeProperties updated itself on every linear step. It may cause redundant calculation. Is it an issue here? Is it proper to open Material::execute_on option?

friedmud commented 9 years ago

We could turn off the one at the beginning of the timestep. Usually we don't care because our problems are large nonlinear problems that take hundreds (typically 100-200) linear iterations with JFNK so adding one more residual evaluation at the beginning simply doesn't matter. Do you really have a problem where you can show that this is adding a significant portion of your solve time? Do you have profiling runs that show it? This feels like over-optimization to me. I suspect that your time would be better spent optimizing your preconditioner (which is usually closer to 50% of the solve time vs. 1/200th).

But it is possible.

You can't "turn off" the one at the end. It's not an extra one just to check for convergence. It is actually the "next" nonlinear residual that would be used for the next nonlinear iteration... since we need to compute it for that purpose anyway we might as well take a moment to take a norm of it to see if we're converged and stop iterating if we are. How else would you know when to stop? Note: MOOSE does not do this... this is part of the normal PETSc solve.

Materials don't work that way. They are not "executed" at all... which is why they don't take "execute_on". The are "computed" whenever the properties are needed. What happens is that when we're executing a set of objects (like Kernels, BCs, Postprocessors, etc.) the Materials that produce material properties on the current subdomain (block) are asked to compute their properties as part of that computation.

So, if you don't compute a residual (ie, compute Kernels, BCs, etc.) then you're not computing Materials at all. Therefore in pure NEWTON we don't compute Materials for each linear iteration.

permcody commented 9 years ago

@lw4992 - Hopefully Derek's explanation is satisfactory for you. If you have further questions or need further explanation we should continue this thread on the mailing list where more users can benefit from discussion.

idaholab / moose

compare solve_type = NEWTON with PJFNK #5341