idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.77k stars 1.05k forks source link

Solver behavior #4767

Closed jasondhales closed 9 years ago

jasondhales commented 9 years ago

I have a BISON analysis that shows odd solver behavior. Eventually, the simulation reaches a point where the first nonlinear step for a new timestep results in a failed solve with MOOSE_DIVERGED_LINE_SEARCH. The time step is halved, and the next solve is successful. This happens again and again over many timesteps. The strange thing is that the NonlinearSystem _initial_residual is very different (many orders of magnitude) between the first, failed solve and the second successful solve. For the successful solve, the _initial_residual is much higher, making convergence easier.

I hacked several files to output information. I also hacked the TimeStepper to force a failed solve to cause no reduction in the time step size. Given that the time step size is identical following a failed solve, you would expect the following solve also to fail. This does not happen. The subsequent solve has a larger _initial_residual, and the solve is successful.

In the terminal output below, the line that begins with "JDH DEBUG: NS::solve:" shows the norm of the initial residual, the current solution, the disp_x, disp_y, and temp portions of _sys.rhs, and dt. These are computed in NonlinearSystem::solve() immediately after _initial_residual is set. The output shows that the initial residual is about 8 orders of magnitude higher after the failed solve, even though dt is identical. For some reason, the disp_x and disp_y norms change but the temp norm does not. However, if you look at the '|residual|_2' norms from MOOSE, along with the '0 Nonlinear |R|' norm MOOSE outputs, they are the same between the failed and the subsequent solve. ?

Does anyone know why this is happening and how it can be fixed?

Time Step 128, time = 8.46486e+07 dt = 45 JDH DEBUG: NS::solve: initRes: 1.24659e-11, cs: 91164.9, x: 9.53473e-12, y: 8.01648e-12, t: 4.74109e-13, dt: 45 |residual|_2 of individual variables: disp_x: 0.00110822 disp_y: 0.000639612 temp: 4.74109e-13

0 Nonlinear |R| = 1.279550e-03 JDH DEBUG: JDH DEBUG: 0, DLS: 0, fnorm: 0.00127955, initRes: 1.24659e-11, rtol: 0.0005, lastNLRNorm: 0.00127955, divThr: 2.49319e-08 0 Linear |R| = 1.279550e-03 1 Linear |R| = 1.006161e-03 2 Linear |R| = 3.486754e-04 3 Linear |R| = 6.979361e-05 4 Linear |R| = 3.442148e-05 5 Linear |R| = 9.987243e-06 6 Linear |R| = 4.776626e-06 7 Linear |R| = 3.426650e-06 8 Linear |R| = 2.744815e-06 9 Linear |R| = 1.933792e-06 10 Linear |R| = 9.483082e-07 |residual|_2 of individual variables: disp_x: 0.0190795 disp_y: 0.0110118 temp: 4.75727e-13

1 Nonlinear |R| = 2.202923e-02 JDH DEBUG: Nonlinear solve was blowing up!

JDH DEBUG: -6, DLS: 1, fnorm: 0.0220292, initRes: 1.24659e-11, rtol: 0.0005, lastNLRNorm: 0.0220292, divThr: 2.49319e-08 Solve Did NOT Converge! JDH DEBUG: **** SAME DT ***

Time Step 128, time = 8.46486e+07 dt = 45 JDH DEBUG: NS::solve: initRes: 0.00127297, cs: 91164.9, x: 0.00110251, y: 0.000636321, t: 4.74109e-13, dt: 45 |residual|_2 of individual variables: disp_x: 0.00110822 disp_y: 0.000639612 temp: 4.74109e-13

0 Nonlinear |R| = 1.279550e-03 < This solve is successful >

permcody commented 9 years ago

Without really diving in my guess would be that something is being lagged somewhere in the system and that extra failed solve is computing those objects which might bring the variables that they are effecting current for use in the next solve. You said that the x,y norms are changing as highlighted in your bolded output above but the printout of the norms just below the bold printouts doesn't show this. Perhaps you are printing values slightly early before the vectors are closed? I'm not sure why we are seeing that discrepancy.

Cody

On Wed, Mar 4, 2015 at 8:30 AM Jason Hales notifications@github.com wrote:

I have a BISON analysis that shows odd solver behavior. Eventually, the simulation reaches a point where the first nonlinear step for a new timestep results in a failed solve with MOOSE_DIVERGED_LINE_SEARCH. The time step is halved, and the next solve is successful. This happens again and again over many timesteps. The strange thing is that the NonlinearSystem _initial_residual is very different (many orders of magnitude) between the first, failed solve and the second successful solve. For the successful solve, the _initial_residual is much higher, making convergence easier.

I hacked several files to output information. I also hacked the TimeStepper to force a failed solve to cause no reduction in the time step size. Given that the time step size is identical following a failed solve, you would expect the following solve also to fail. This does not happen. The subsequent solve has a larger _initial_residual, and the solve is successful.

In the terminal output below, the line that begins with "JDH DEBUG: NS::solve:" shows the norm of the initial residual, the current solution, the disp_x, disp_y, and temp portions of _sys.rhs, and dt. These are computed in NonlinearSystem::solve() immediately after _initial_residual is set. The output shows that the initial residual is about 8 orders of magnitude higher after the failed solve, even though dt is identical. For some reason, the disp_x and disp_y norms change but the temp norm does not. However, if you look at the '|residual|_2' norms from MOOSE, along with the '0 Nonlinear |R|' norm MOOSE outputs, they are the same between the failed and the subsequent solve. ?

Does anyone know why this is happening and how it can be fixed?

Time Step 128, time = 8.46486e+07 dt = 45 JDH DEBUG: NS::solve: initRes: 1.24659e-11, cs: 91164.9, x: 9.53473e-12, y: 8.01648e-12, t: 4.74109e-13, dt: 45 |residual|_2 of individual variables: disp_x: 0.00110822 disp_y: 0.000639612 temp: 4.74109e-13

0 Nonlinear |R| = 1.279550e-03 JDH DEBUG: JDH DEBUG: 0, DLS: 0, fnorm: 0.00127955, initRes: 1.24659e-11, rtol: 0.0005, lastNLRNorm: 0.00127955, divThr: 2.49319e-08 0 Linear |R| = 1.279550e-03 1 Linear |R| = 1.006161e-03 2 Linear |R| = 3.486754e-04 3 Linear |R| = 6.979361e-05 4 Linear |R| = 3.442148e-05 5 Linear |R| = 9.987243e-06 6 Linear |R| = 4.776626e-06 7 Linear |R| = 3.426650e-06 8 Linear |R| = 2.744815e-06 9 Linear |R| = 1.933792e-06 10 Linear |R| = 9.483082e-07 |residual|_2 of individual variables: disp_x: 0.0190795 disp_y: 0.0110118 temp: 4.75727e-13

1 Nonlinear |R| = 2.202923e-02 JDH DEBUG: Nonlinear solve was blowing up!

JDH DEBUG: -6, DLS: 1, fnorm: 0.0220292, initRes: 1.24659e-11, rtol: 0.0005, lastNLRNorm: 0.0220292, divThr: 2.49319e-08 Solve Did NOT Converge! JDH DEBUG: **** SAME DT ***

Time Step 128, time = 8.46486e+07 dt = 45 JDH DEBUG: NS::solve: initRes: 0.00127297, cs: 91164.9, x: 0.00110251, y: 0.000636321, t: 4.74109e-13, dt: 45 |residual|_2 of individual variables: disp_x: 0.00110822 disp_y: 0.000639612 temp: 4.74109e-13

0 Nonlinear |R| = 1.279550e-03

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767.

jasondhales commented 9 years ago

I've copied the code I've used to print out the norms below (slightly modified from that used to generate the original output). It is true that the norms from MOOSE just below mine don't change. Also, for the first, failing solve, the norms I print are much different from those MOOSE prints, which I also don't understand. It seems like something is wrong with that first residual evaluation.

if (_fe_problem.solverParams()._type != Moose::ST_LINEAR)
{
  //Calculate the initial residual for use in the convergence criterion.  The initial
  //residual
  _computing_initial_residual = true;
  _fe_problem.computeResidual(_sys, *_current_solution, *_sys.rhs);
  _computing_initial_residual = false;
  _sys.rhs->close();
  _initial_residual = _sys.rhs->l2_norm();
  std::cout << "JDH DEBUG: NS::solve: initRes: " << _initial_residual << ", cs: " << _current_solution->l2_norm()
            << ", " << _sys.variable_name(0) << ": " << _sys.calculate_norm(*_sys.rhs, 0, DISCRETE_L2)
            << ", " << _sys.variable_name(1) << ": " << _sys.calculate_norm(*_sys.rhs, 1, DISCRETE_L2)
            << ", " << _sys.variable_name(2) << ": " << _sys.calculate_norm(*_sys.rhs, 2, DISCRETE_L2)
            << ", dt: " << _fe_problem.dt() << std::endl;
}
friedmud commented 9 years ago

I agree with @permcody : it feels like something is being lagged.

Can you try to turn off things and figure out what it is that's causing the problem? Maybe use the restart system to create a checkpoint at the point of failure and then try to restart with less stuff turned on until you find the thing that's causing grief?

friedmud commented 9 years ago

BTW: I'm not saying it's your fault or anything... but to dig we're going to need a direction!

bwspenc commented 9 years ago

@jasondhales A while back I figured out everything that we typically lag in BISON. I'm looking at my model where I changed them, and they are: plenum pressure, average interior temperature, fission gas released, gas volume. Try adding this to all of those blocks: execute_on = 'initial nonlinear timestep_end'

permcody commented 9 years ago

If that fixes it, it might mean we need to adjust defaults in MOOSE. This is definitely something we need to look at. On Wed, Mar 4, 2015 at 5:40 PM Ben Spencer notifications@github.com wrote:

@jasondhales https://github.com/jasondhales A while back I figured out everything that we typically lag in BISON. I'm looking at my model where I changed them, and they are: plenum pressure, average interior temperature, fission gas released, gas volume. Try adding this to all of those blocks: execute_on = 'initial nonlinear timestep_end'

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767#issuecomment-77283079.

jwpeterson commented 9 years ago

This also reminds me of some issues we had with PresetDirichletBCs and initial residuals in the past...

jasondhales commented 9 years ago

It was PlenumPressure. I set it to run at every opportunity, as well as all the PPs it needs. The initial residual is now consistent. Thanks for the help!

permcody commented 9 years ago

Jason, since you just lost a day to this bug, is there anything we can do at the framework level to help you figure this out sooner? Ideally we'd be able to tell you when you are using a lagged value but that feature always seems to be in the future. It might be feasible to track which objects couple to which objects so that we could at least report places where lagging might occur through a debugging flag or something. Let us know if you come up with something you'd like to see in the framework.

jasondhales commented 9 years ago

One idea is to add 'residual' or 'all' to the list of execute_on options -- something that would cause the PP to run whenever a residual is calculated. This wouldn't prevent someone like me from tripping over a set of dependent calculations, but it would simplify avoiding them.

On Thu, Mar 5, 2015 at 2:56 PM, Cody Permann notifications@github.com wrote:

Jason, since you just lost a day to this bug, is there anything we can do at the framework level to help you figure this out sooner? Ideally we'd be able to tell you when you are using a lagged value but that feature always seems to be in the future. It might be feasible to track which objects couple to which objects so that we could at least report places where lagging might occur through a debugging flag or something. Let us know if you come up with something you'd like to see in the framework.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767#issuecomment-77460739.

Jason D. Hales, PhD Department Manager Fuels Modeling & Simulation Idaho National Laboratory (208) 526-2293

friedmud commented 9 years ago

'residual' used to be an option... is it not now?

On Thu, Mar 5, 2015 at 7:44 PM Jason Hales notifications@github.com wrote:

One idea is to add 'residual' or 'all' to the list of execute_on options -- something that would cause the PP to run whenever a residual is calculated. This wouldn't prevent someone like me from tripping over a set of dependent calculations, but it would simplify avoiding them.

On Thu, Mar 5, 2015 at 2:56 PM, Cody Permann notifications@github.com wrote:

Jason, since you just lost a day to this bug, is there anything we can do at the framework level to help you figure this out sooner? Ideally we'd be able to tell you when you are using a lagged value but that feature always seems to be in the future. It might be feasible to track which objects couple to which objects so that we could at least report places where lagging might occur through a debugging flag or something. Let us know if you come up with something you'd like to see in the framework.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767#issuecomment-77460739.

Jason D. Hales, PhD Department Manager Fuels Modeling & Simulation Idaho National Laboratory (208) 526-2293

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767#issuecomment-77484777.

permcody commented 9 years ago

No, that was part of the renaming Andrew did. Residual -> Linear and Jacobian -> Nonlinear. Jason suggested that we consider rolling up automatically or offering that as an explicit option. On Thu, Mar 5, 2015 at 6:35 PM Derek Gaston notifications@github.com wrote:

'residual' used to be an option... is it not now?

On Thu, Mar 5, 2015 at 7:44 PM Jason Hales notifications@github.com wrote:

One idea is to add 'residual' or 'all' to the list of execute_on options

something that would cause the PP to run whenever a residual is calculated. This wouldn't prevent someone like me from tripping over a set of dependent calculations, but it would simplify avoiding them.

On Thu, Mar 5, 2015 at 2:56 PM, Cody Permann notifications@github.com wrote:

Jason, since you just lost a day to this bug, is there anything we can do at the framework level to help you figure this out sooner? Ideally we'd be able to tell you when you are using a lagged value but that feature always seems to be in the future. It might be feasible to track which objects couple to which objects so that we could at least report places where lagging might occur through a debugging flag or something. Let us know if you come up with something you'd like to see in the framework.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767#issuecomment-77460739.

Jason D. Hales, PhD Department Manager Fuels Modeling & Simulation Idaho National Laboratory (208) 526-2293

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767#issuecomment-77484777.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767#issuecomment-77490653.

friedmud commented 9 years ago

I see: that is something we used to do. "residual" ones were executed everywhere (jacobian, timestep, timestep_end, etc.). I guess we changed that a while back too...

Would be nice to have an explicit option for that though...

On Thu, Mar 5, 2015 at 9:16 PM Cody Permann notifications@github.com wrote:

No, that was part of the renaming Andrew did. Residual -> Linear and Jacobian -> Nonlinear. Jason suggested that we consider rolling up automatically or offering that as an explicit option. On Thu, Mar 5, 2015 at 6:35 PM Derek Gaston notifications@github.com wrote:

'residual' used to be an option... is it not now?

On Thu, Mar 5, 2015 at 7:44 PM Jason Hales notifications@github.com wrote:

One idea is to add 'residual' or 'all' to the list of execute_on

options

something that would cause the PP to run whenever a residual is calculated. This wouldn't prevent someone like me from tripping over a set of dependent calculations, but it would simplify avoiding them.

On Thu, Mar 5, 2015 at 2:56 PM, Cody Permann <notifications@github.com

wrote:

Jason, since you just lost a day to this bug, is there anything we can do at the framework level to help you figure this out sooner? Ideally we'd be able to tell you when you are using a lagged value but that feature always seems to be in the future. It might be feasible to track which objects couple to which objects so that we could at least report places where lagging might occur through a debugging flag or something. Let us know if you come up with something you'd like to see in the framework.

— Reply to this email directly or view it on GitHub <https://github.com/idaholab/moose/issues/4767#issuecomment-77460739 .

Jason D. Hales, PhD Department Manager Fuels Modeling & Simulation Idaho National Laboratory (208) 526-2293

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767#issuecomment-77484777.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767#issuecomment-77490653.

— Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/4767#issuecomment-77494624.