Closed frombs closed 4 years ago
Interesting... @fdkong - will you take a look at this?
@frombs - Are you planning to make a PR out of this? You'll need to rebase when you do - it looks like you picked up a few commits from the devel branch, which is cluttering up your diff.
Yes, I would like to submit a PR and can rebase if you want to proceed. This was the only solution I found for the issue reported on the Moose Users Group and may be helpful to other users. I am currently re-running the simulation with the new feature and the results are looking good (red plot in graph below).
Thanks @frombs. It is a very good feature. We are looking forward for your PR.
The default tolerance can be set two ways. Option 1 is to use the default PETSc tolerance of 1e4
and option 2 is to set the tolerance to -1
which disables the tolerance check completely unless the user adds the -snes_divergence_tolerance
option in their input file. Which do you prefer?
I would like to follow PETSc default.
@fdkong, I need to make a small change to libmesh to set the default tolerance. Do I do this in the libmesh submodule or do I need to create a libmesh fork and make the changes directly to libmesh? Also, how do I link the libmesh changes to the PR? Thanks for your help.
Do I do this in the libmesh submodule or do I need to create a libmesh fork and make the changes directly to libmesh?
This isn't quite the right question but I understand what you are asking. It doesn't really matter if you make changes in a submodule or a root repository for any git repository. So you pick!
The question about using a fork or not depends on your privilege level: You can either push a branch to a repository or not. If you can, which is rare then you can open a PR from a local branch in the upstream repository. If you can't (normally the case) then you have to fork and create a PR from your fork. BTW - libmesh is the normal case.
Also, how do I link the libmesh changes to the PR?
The really cool thing about submodules is that we respect them in PR testing. You'll have to wait until your change is accepted into libMesh. Once it is, you will push up a PR with whatever changes you need to MOOSE AND a submodule update to a commit containing the changes in whatever submodules are affected (e.g. libMesh in this case). When CIVET pulls your PR, it'll see that there's a new version of libMesh so it'll build it and then it'll build MOOSE. In this way you can test out libMesh changes before they are merged.
Thanks @permcody. So to clarify, I need to wait to create a PR for issue #13991 until the changes are accepted in libmesh?
Well you are welcome to make a change for MOOSE but if it requires the libMesh change then the tests will fail. You can mark it WIP and push it up though, up to you. Then when libMesh has been accepted you can update the submodule and test everything out.
Here is a link to the libmesh change: Libmesh#2253
@fdkong, I created the input parameter nl_div_tol
inFEProblemSove.C
as you suggested. Should I do the same for SlepcSupport.C
?
@fdkong, I created the input parameter
nl_div_tol
inFEProblemSove.C
as you suggested. Should I do the same forSlepcSupport.C
?
You do not need to take care of SlepcSupport.C
unless you are using the eigenvalue solver right now. I have a plan to refactor the eigenvalue solver
@fdkong, can you review my latest changes in 60face1 and in libmesh?
Also, in regards tonl_div_tol
in FEProblemSove.C
, how do I correlate this tolerance being set in the Moose input file with libmesh and PETSc? In other words, how does PETSc know that we are overriding the default tolerance in the input file?
@fdkong, can you review my latest changes in 60face1 and in libmesh?
Also, in regards to
nl_div_tol
inFEProblemSove.C
, how do I correlate this tolerance being set in the Moose input file with libmesh and PETSc? In other words, how does PETSc know that we are overriding the default tolerance in the input file?
In FEProblemSove.C
, you need to have something like:
params.addParam<Real>("nl_div_tol", 1.0e+4, "Nonlinear Divergence Tolerance");
es.parameters.set<Real>("nonlinear solver divergence tolerance") =
getParam<Real>("nl_div_tol");
In libmesh, NonlinearImplicitSystem
,
const double div_tol =
double(es.parameters.get<Real>("nonlinear solver divergence tolerance"));
nonlinear_solver-> divergence_tolerance = div_tol;
Hopefully, this helps
@fdkong, Is there a test directory in Moose where non-linear solver parameters are tested? If not, where do you want it to go?
We do not have any right now because all tolerances are actually used almost for the very test.
You could go-ahead to add to executioners
right now. If need, we will move around in the future. you may add a new directory.
Reason
A SNES Divergence Tolerance option was added to PETSC in Version 3.8.0. See description here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetFromOptions.html https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetDivergenceTolerance.html
This is a useful feature for hard to solve nonlinear systems such as multivariable problems like the split Cahn-Hillard phase field model. Intermittently during a simulation if there is substantial evolution within the microstructure between time steps, the non-linear residuals may begin to diverge instead of converge if the time step is too large. When the residual values become large, PETSc becomes bottlenecked and it can take hours for the
nl_max_its
threshold to be reached so the time step can be reduced.Below is a graph that demonstrated the issue. Wall time is plotted on the x-axis and the time to complete each time step is represented by the y-axis. As the graph shows, the solver bottlenecked 36 hours into the simulation when the large divergence occurred and it took over 3 hours to recover. A number of the smaller peaks were also caused by the same issue but the residuals stabilized without the tolerance getting too large.
With the
-snes_divergence_tolerance
option active, PETSc monitors the residuals at each non-linear step and will automatically cut back the time step if divergence is detected. The default divergence tolerance in PETSc is 1e4 but the divergence check can also be disabled by setting the tolerance equal to -1. Note: this is not the same tolerance as-ksp-divtol
used to prevent divergence within the the linear solver.Design
The
-snes_divergence_tolerance
option has been implemented in the commit below. Additionally, the following PDF contains console output showing that the fix improves the stability of the solve and reduces wall time by cutting the time step before the divergence gets out of hand.SNES_DIVERGENCE_OUTPUT.pdf
Impact
The change will speed-up simulations for hard to solve nonlinear systems by enforcing a cut in time step when divergence of the residual is detected. A small update to
petsc_nonlinear_solver.C
in Libmesh will also be needed to set the divergence tolerance.