Residual based automatic scaling

lindsayad commented 4 years ago

Reason

Convergence checks, both linear and nonlinear, are based on residuals. In multi-physics simulations with bad comparative residual scaling a linear solve may be considered converged even when it doesn't sufficiently solve one variable resulting in a bad Newton step for that variable. This can eventually result in a failed non-linear solve (in the case of ReferenceResidualProblem) for example because the "less important" variable never converges. Alternatively for the traditional convergence check for FEProblem a non-linear solve may be considered converged even when the "less important" variable hasn't moved at all.

Design

Very similar to automatic scaling based on the Jacobian/preconditioning matrix but use the residual values. Moreover, we may want the default to be to compute it on every time step since loads may be changing dramatically for these multi-physics simulations.

Impact

Hopefully improve solve convergence for some multi-physics applications

lindsayad commented 4 years ago

@bwspenc @novasr @friedmud @fdkong

fdkong commented 4 years ago

@lindsayad Actually we could do both if we want. Do you think it is reasonable ?

lindsayad commented 4 years ago

What do you mean? Have the option to do both? Or do both within the same simulation? I would struggle to see how we can use them both in one simulation

lindsayad commented 4 years ago

Would you scale the residual in a custom convergence check (both linear and non-linear) if you want to do both?

fdkong commented 4 years ago

OK, Residual scaling help bring the nonlinear residual to a nice scale, and help convergence check.

'Jacobian scaling' will improve the condition number of matrix.

If we want to apply both, then 'Jacobian scaling' can be considered as a linear preconditioner in this context.

But I am not sure if we can get any benefits or not by doing this.

tophmatthews commented 4 years ago

I struggle to see why this matters in reference residual, since each variable is compared against itself?

dschwen commented 4 years ago

I think it might matter for Eisenstat-Walker, which doesn't know anything about the scaling. Alex mentioned that ideally both the jacobian as well as the residual should be close to unity.

lindsayad commented 4 years ago

It could matter for any linear solve because one could envision a situation where even if the linear residual has been cranked down my orders of magnitude, if we were to examine a sub-vector of that linear residual corresponding to an individual variable we might find that it's norm has barely changed at all...

lindsayad commented 4 years ago

If we want to apply both, then 'Jacobian scaling' can be considered as a linear preconditioner in this context.

Ok, you want to do this?

bwspenc commented 4 years ago

If we want to apply both, then 'Jacobian scaling' can be considered as a linear preconditioner in this context.

So would it be applied on top of whatever other preconditioner was computed? I don't think that would work because the preconditioner was already computed to be correct for the physics, and scaling it would make it wrong.

bwspenc commented 4 years ago

I like this idea, but I'm a little skeptical that it will help. It seems like we just need to accept that the residuals of the individual solution variables can have widely varying scaling, and make all of the convergence checks work robustly when there is a wide disparity. Can we add a PETSc callback similar to the one we have now for checking nonlinear convergence, but for linear convergence? That wouldn't help us with Eisenstat-Walker, but I think it would completely cover us for the standard PJFNK algorithm.

fdkong commented 4 years ago

So would it be applied on top of whatever other preconditioner was computed? I don't think that would work because the preconditioner was already computed to be correct for the physics, and scaling it would make it wrong.

It will definitely work. The question if it is worthwhile to do that?

It will be applied to the linear level.

The simplest way is to use a PCCOMPOSITE https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCCOMPOSITE.html

But I would like to do a native implementation at the moose side

bwspenc commented 4 years ago

Oh, I guess the other place this matters is for the PJFNK epsilon. If it really would work to apply an additional scaling as @fdkong says it would, then it seems worth doing.

lindsayad commented 4 years ago

Epsilon calculation is based on the norm of the solution vector and the norm of whatever vector is involved in the matrix-vector product to be approximated. At least that is what I see in the calculation of the difference parameter for walker-pernice and Dennis-Schnaebel. So I’m not sure that residual scaling will matter for the selection of that parameter.

So Fande how would this be done in a native MOOSE way? I understand conceptually the PC composite way.

On Nov 18, 2019, at 3:11 PM, Ben Spencer notifications@github.com wrote:

Oh, I guess the other place this matters is for the PJFNK epsilon. If it really would work to apply an additional scaling as @fdkong says it would, then it seems worth doing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

lindsayad commented 4 years ago

It will definitely work. The question if it is worthwhile to do that?

Why wouldn't it be worthwhile? The current automatic scaling implementation has been nothing but beneficial in single physics cases so I think it would be silly to abandon it simply because we are encountering some convergence issues in this single BISON run.

friedmud commented 4 years ago

This is definitely NOT simply a convergence criteria issue. Improper scaling of the residual can lead to precision loss and non-convergence. This is especially true in JFNK.

@lindsayad that "vector in the matrix-vector product" is the residual...

Regardless of how epsilon is computed... having widely varying entries in the residual vector will mean there will be precision loss in the finite-difference... one way or the other.

Why can't we simply combine the two using a linear parameter? 0 for pure Jacobian scaling, 1 for pure residual scaling... and anything in-between for a mix of the two?

I'm really interested in seeing some BISON results with ranges of manual scaling applied to solid mechanics...

friedmud commented 4 years ago

Wait - @lindsayad ... @fdkong and I are looking at PETSc... something is definitely not right here. Will report back.

friedmud commented 4 years ago

@fdkong and I are going to talk to Barry about this. To us... the formulas for the differencing parameter look wrong. In their current form they could definitely be causing (possibly huge) precision loss.

We will report back soon...

lindsayad commented 4 years ago

@lindsayad that "vector in the matrix-vector product" is the residual...

Indeed (or recursive application of the Jacobian action to it). Good point! 👍

Why can't we simply combine the two using a linear parameter? 0 for pure Jacobian scaling, 1 for pure residual scaling... and anything in-between for a mix of the two?

Sure yea that sounds good.

To us... the formulas for the differencing parameter look wrong. In their current form they could definitely be causing (possibly huge) precision loss.

They don't match the formulas on the PETSc manual pages?

dschwen commented 4 years ago

Are we sure we want linear mixing of those factors? It seems to me that we might want logarithmic mixing (i.e. weighted average of the exponents in the scientific notation)

lindsayad commented 4 years ago

Ah yes definitely

On Nov 20, 2019, at 7:26 PM, Daniel Schwen notifications@github.com wrote:

Are we sure we want linear mixing of those factors? It seems to me that we might want logarithmic mixing (i.e. weighted average of the exponents in the scientific notation)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

fdkong commented 4 years ago

It will definitely work. The question if it is worthwhile to do that?

Why wouldn't it be worthwhile? The current automatic scaling implementation has been nothing but beneficial in single physics cases so I think it would be silly to abandon it simply because we are encountering some convergence issues in this single BISON run.

It is unclear how much a Jacobian scaling will help after a residual scaling has been done.

It was why I said if it was worthwhile to implement both at the same time or not?

Anyway, it is not hard to implement.

1) scale Jacobian in the Jacobian evaluation, and at the same time, the residual should be adjusted accordingly. The residual can be retrieved from SNES.

2) After the linear solver is done, we scale the system back in linesearch_precheck so that nonlinear the residual honors the residual scaling.

lindsayad commented 4 years ago

What you’re proposing certainly makes sense of convergence checks.

But for PJFNK we have to consider what @friedmud has been harping on...the residual will factor into selection of the differencing parameter and the accuracy of the approximation or the Jacobian action. It may not matter if the PMat is well scaled if we’re getting garbage from the approximation of the matrix-vector product. For PJFNK residual scaling may very well yield a better linear solve and newton step than pmat scaling. Or perhaps there’s a crossover point, motivating the manual scaling sweep of that @friedmud has asked for.

It will also be easy to do this sweep automatically. This issue is my first priority after I put up my RANFS PR.

On Nov 20, 2019, at 8:45 PM, Fande Kong notifications@github.com wrote:

It will definitely work. The question if it is worthwhile to do that?

Why wouldn't it be worthwhile? The current automatic scaling implementation has been nothing but beneficial in single physics cases so I think it would be silly to abandon it simply because we are encountering some convergence issues in this single BISON run.

It is unclear how much a Jacobian scaling will help after a residual scaling has been done.

It was why I said it was worthwhile to implement both.

Anyway, it is not hard to implement.

scale Jacobian in the Jacobian evaluation, and at the same time, the residual should be adjusted accordingly. The residual can be retrieved from SNES.

After the linear solver is done, we scale the system back in linesearch_precheck so that nonlinear residual honors the residual scaling,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

lindsayad commented 4 years ago

But I guess that’s what you were saying...a Jacobian scaling may not make sense after a residual scaling.

I would be really curious though to try pure residual scaling and then do some kind of “Jacobian” scaling in a PCCOMPOSITE.

On Nov 20, 2019, at 10:09 PM, Alexander Lindsay alexlindsay239@gmail.com wrote:

What you’re proposing certainly makes sense of convergence checks.

But for PJFNK we have to consider what @friedmud has been harping on...the residual will factor into selection of the differencing parameter and the accuracy of the approximation or the Jacobian action. It may not matter if the PMat is well scaled if we’re getting garbage from the approximation of the matrix-vector product. For PJFNK residual scaling may very well yield a better linear solve and newton step than pmat scaling. Or perhaps there’s a crossover point, motivating the manual scaling sweep of that @friedmud has asked for.

It will also be easy to do this sweep automatically. This issue is my first priority after I put up my RANFS PR.

On Nov 20, 2019, at 8:45 PM, Fande Kong notifications@github.com wrote:

It will definitely work. The question if it is worthwhile to do that?

Why wouldn't it be worthwhile? The current automatic scaling implementation has been nothing but beneficial in single physics cases so I think it would be silly to abandon it simply because we are encountering some convergence issues in this single BISON run.

It is unclear how much a Jacobian scaling will help after a residual scaling has been done.

It was why I said it was worthwhile to implement both.

Anyway, it is not hard to implement.

scale Jacobian in the Jacobian evaluation, and at the same time, the residual should be adjusted accordingly. The residual can be retrieved from SNES.

After the linear solver is done, we scale the system back in linesearch_precheck so that nonlinear residual honors the residual scaling,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

fdkong commented 4 years ago

But for PJFNK we have to consider what @friedmud has been harping on...the residual will factor into selection of the differencing parameter and the accuracy of the approximation or the Jacobian action.

If Pmat is in the right scale, and then residual function for computing Jacobian should be in the right scale as Pmat is an approximation of the Jacobian matrix.

That being said: the residual callback from GMRES should use the scaling factor from PMat as these residual evaluations are used to compute derivatives.

And then convergence checks use the residual scaling.

lindsayad commented 4 years ago

If Pmat is in the right scale, and then residual function for computing Jacobian should be in the right scale as Pmat is an approximation of the Jacobian matrix.

But that BISON run perfectly illustrates that the residual and Pmat may have dramatically different scales

friedmud commented 4 years ago

Ok - after studying this closer with @fdkong ... there are 3 things that need to be handled.

The solution values for all of the physics need to be near 1
The residual values for all of the physics need to be near 1
The Jacobian should be scaled to improve the condition number (making diagonal entries near 1 is a good start).

(1) is new (to me)... but is critical. If the solution vector has wildly varying ranges in it... then there is no possibility of finding a perturbation that will be accurate for all of the entries.

This goes back to @lindsayad 's suggestion of non-dimensionalization. That's still not the answer... but I am starting to believe that we need to be a bit more careful about choosing units to make the values in the solution vector closer to "1".

Actually: all of this is making me rethink JFNK in general... we have some new ideas in AD... and maybe AD with Newton is just better because it doesn't suffer from these issues (not all of them anyway - the Jacobian scaling is still important).

lindsayad commented 4 years ago

This goes back to @lindsayad 's suggestion of non-dimensionalization. That's still not the answer... but I am starting to believe that we need to be a bit more careful about choosing units to make the values in the solution vector closer to "1".

This can yield the same result in practice. I've done this in TMAP.

Actually: all of this is making me rethink JFNK in general... we have some new ideas in AD... and maybe AD with Newton is just better because it doesn't suffer from these issues (not all of them anyway - the Jacobian scaling is still important).

Yay for NEWTON! Although PJFNK will still have to be an important player in many applications. Putting AD into libMesh will help along the road to NEWTON although I worry about the computational expense. I feel like there will be decisions to make between the computational expense of AD NEWTON vs the computational expense (in terms of added non-linear iterations)/lack of robustness of a poorly scaled PJFNK vs the developer/user effort required to construct a well-scaled PJFNK.

For the current BISON problem, the relative unit scaling is not great (T ~ 1000, disp ~ 1e-2) for finding a good perturbation, but I still think its worth some effort to see whether hybrid residual-Jacobian scaling (or pure residual scaling) can improve things. I'm working on the code right now. It shoudn't take me too long to get a PR up, maybe by the end of the day.

novasr commented 4 years ago

@elementx54 and @acasagran, fyi

idaholab / moose