Add PETSc Option -snes_divergence_tolerance to Moose

frombs commented 5 years ago

Reason

A SNES Divergence Tolerance option was added to PETSC in Version 3.8.0. See description here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetFromOptions.html https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetDivergenceTolerance.html

This is a useful feature for hard to solve nonlinear systems such as multivariable problems like the split Cahn-Hillard phase field model. Intermittently during a simulation if there is substantial evolution within the microstructure between time steps, the non-linear residuals may begin to diverge instead of converge if the time step is too large. When the residual values become large, PETSc becomes bottlenecked and it can take hours for the nl_max_its threshold to be reached so the time step can be reduced.

Below is a graph that demonstrated the issue. Wall time is plotted on the x-axis and the time to complete each time step is represented by the y-axis. As the graph shows, the solver bottlenecked 36 hours into the simulation when the large divergence occurred and it took over 3 hours to recover. A number of the smaller peaks were also caused by the same issue but the residuals stabilized without the tolerance getting too large.

SNES_DIVERGENCE_PLOT

With the -snes_divergence_tolerance option active, PETSc monitors the residuals at each non-linear step and will automatically cut back the time step if divergence is detected. The default divergence tolerance in PETSc is 1e4 but the divergence check can also be disabled by setting the tolerance equal to -1. Note: this is not the same tolerance as -ksp-divtol used to prevent divergence within the the linear solver.

Design

The -snes_divergence_tolerance option has been implemented in the commit below. Additionally, the following PDF contains console output showing that the fix improves the stability of the solve and reduces wall time by cutting the time step before the divergence gets out of hand.

SNES_DIVERGENCE_OUTPUT.pdf

Impact

The change will speed-up simulations for hard to solve nonlinear systems by enforcing a cut in time step when divergence of the residual is detected. A small update to petsc_nonlinear_solver.Cin Libmesh will also be needed to set the divergence tolerance.

permcody commented 5 years ago

Interesting... @fdkong - will you take a look at this?

@frombs - Are you planning to make a PR out of this? You'll need to rebase when you do - it looks like you picked up a few commits from the devel branch, which is cluttering up your diff.

frombs commented 5 years ago

Yes, I would like to submit a PR and can rebase if you want to proceed. This was the only solution I found for the issue reported on the Moose Users Group and may be helpful to other users. I am currently re-running the simulation with the new feature and the results are looking good (red plot in graph below).

SNES_DTOL_NEW

fdkong commented 5 years ago

Thanks @frombs. It is a very good feature. We are looking forward for your PR.

frombs commented 5 years ago

The default tolerance can be set two ways. Option 1 is to use the default PETSc tolerance of 1e4 and option 2 is to set the tolerance to -1 which disables the tolerance check completely unless the user adds the -snes_divergence_tolerance option in their input file. Which do you prefer?

fdkong commented 5 years ago

I would like to follow PETSc default.

frombs commented 5 years ago

@fdkong, I need to make a small change to libmesh to set the default tolerance. Do I do this in the libmesh submodule or do I need to create a libmesh fork and make the changes directly to libmesh? Also, how do I link the libmesh changes to the PR? Thanks for your help.

permcody commented 5 years ago

Do I do this in the libmesh submodule or do I need to create a libmesh fork and make the changes directly to libmesh?

This isn't quite the right question but I understand what you are asking. It doesn't really matter if you make changes in a submodule or a root repository for any git repository. So you pick!

The question about using a fork or not depends on your privilege level: You can either push a branch to a repository or not. If you can, which is rare then you can open a PR from a local branch in the upstream repository. If you can't (normally the case) then you have to fork and create a PR from your fork. BTW - libmesh is the normal case.

Also, how do I link the libmesh changes to the PR?

The really cool thing about submodules is that we respect them in PR testing. You'll have to wait until your change is accepted into libMesh. Once it is, you will push up a PR with whatever changes you need to MOOSE AND a submodule update to a commit containing the changes in whatever submodules are affected (e.g. libMesh in this case). When CIVET pulls your PR, it'll see that there's a new version of libMesh so it'll build it and then it'll build MOOSE. In this way you can test out libMesh changes before they are merged.

frombs commented 5 years ago

Thanks @permcody. So to clarify, I need to wait to create a PR for issue #13991 until the changes are accepted in libmesh?

permcody commented 5 years ago

Well you are welcome to make a change for MOOSE but if it requires the libMesh change then the tests will fail. You can mark it WIP and push it up though, up to you. Then when libMesh has been accepted you can update the submodule and test everything out.

frombs commented 5 years ago

Here is a link to the libmesh change: Libmesh#2253

frombs commented 5 years ago

@fdkong, I created the input parameter nl_div_tol inFEProblemSove.Cas you suggested. Should I do the same for SlepcSupport.C?

fdkong commented 5 years ago

@fdkong, I created the input parameter nl_div_tol inFEProblemSove.Cas you suggested. Should I do the same for SlepcSupport.C?

You do not need to take care of SlepcSupport.C unless you are using the eigenvalue solver right now. I have a plan to refactor the eigenvalue solver

frombs commented 5 years ago

@fdkong, can you review my latest changes in 60face1 and in libmesh?

Also, in regards tonl_div_tol in FEProblemSove.C, how do I correlate this tolerance being set in the Moose input file with libmesh and PETSc? In other words, how does PETSc know that we are overriding the default tolerance in the input file?

fdkong commented 5 years ago

@fdkong, can you review my latest changes in 60face1 and in libmesh?

Also, in regards tonl_div_tol in FEProblemSove.C, how do I correlate this tolerance being set in the Moose input file with libmesh and PETSc? In other words, how does PETSc know that we are overriding the default tolerance in the input file?

In FEProblemSove.C, you need to have something like:

params.addParam<Real>("nl_div_tol", 1.0e+4, "Nonlinear Divergence Tolerance");

  es.parameters.set<Real>("nonlinear solver divergence tolerance") =
      getParam<Real>("nl_div_tol");

In libmesh, NonlinearImplicitSystem,

  const double div_tol =
    double(es.parameters.get<Real>("nonlinear solver divergence tolerance"));

nonlinear_solver-> divergence_tolerance = div_tol;

Hopefully, this helps

frombs commented 5 years ago

@fdkong, Is there a test directory in Moose where non-linear solver parameters are tested? If not, where do you want it to go?

fdkong commented 5 years ago

We do not have any right now because all tolerances are actually used almost for the very test.

You could go-ahead to add to executioners right now. If need, we will move around in the future. you may add a new directory.

idaholab / moose