Implement improved LM solver strategy suggested by WNB

Original comment thread migrated from bugzilla

at 2006-01-31 11:11:09 Jan Noordam replied:

WNB has outlined his ideas about solving strategy in a series of recent emails (which should be copied here, of course). This is just my summary, and a few comments, so that we have at least some record of these things.

A reminder of some background:

-) The solver is linear if the LM parameter is zero, and non-linear otherwise.

-) When non-linear, the solver assumes a parabolic shape for the minimum of the qhisq surface.

-) The magnitude of the LM parameter determines the step-size in solution- space. Initially, it is set to a 'large' value, and it is decreased (or increased) as the solvers feels that it is getting closer to the minimum.

-) A non-zero LM parameter assures that the matrix is positive-definite, so it will always invert. In practice, finite machine precision will cause errors. This is avoided by having a non-zero colinearity factor, which will cause a FAIL when the vectors get more parallel than a certain small angle. This happens if we try to solve with too little information.

-) Using SVD will avoid this condition, because, if necessary, it generates some extra condition equations, that supply the missing information. This may be detected by looking at the matrix rank, which is smaller than full. NB: The extra processing cost of using SVD is negligible.

-) The alternative to SVD is to supply the extra condition equations explicitly (using extra condeqs), e.g. equations that require the sum of the unkown phases to be zero. According to WNB, supplying a priori information in this way MAY cause a non-optimal path to the solution...

-) Note that all this explains why the rank gradually decreases as we get closer to the solution: As the maginitude of the LM parameter decreases, the non-zero colinearity factor will cause SVD to kick in with more and more extra equations. (It is now possible to do tests with different values of the colinearity factor, and with or without SVD.)

The recommended strategy is the following:

1) Set the colinearity factor to zero, and switch off SVD. Since the LM parameter is non-zero, it will converge non-linearly. Do NOT supply and extra condition equations.

2) After convergence, set the LM parameter to zero, and do a last linear solution, with either SVD or explicit condition equations (the latter are by definition linear in the unknowns!). NB: Only at this point is it meaningful to ask the WNB fitter for covariance values...

Some remarks:

-) I have little feeling of what machine inprecesion will do to the solution in the first phase. It seems to me that it is safer (at little cos) to keep SVD switched on, and have a small colinearity factor (i.e. what we are doing now).

-) In order to include extra condition equations in the last step, the solver needs to know which of its condeq children contain them. This is certainly possible, because the tree builder can set a switch when the condeq is created.

-) If we decide to go this route, the intelligence has to be built into the MeqSolver node. In this stage, I am not yet convinced of the superiority of this strategy over our current practice, but that could change...

-) See also the Bugzilla item on estimating the chisq surface

There are two issues here: The accuracy of the solution (especially in the presence of noise), and efficiency (the number of iterations needed). As to the latter, I am much more concerned about using the last solution as starting position for the next, which is impossible in tiled solutions....

at 2006-02-03 08:33:01 Wim Brouw replied:

Try again:

The description lacks in some details/precision. Most of it is explained in the LSQ paper (in Newstar and/or aips++ meos); the fitting.h and the LSQFit.h. In addition a look at NonLinearLSQ{,LM}.h and the implementation in .cc will help.

A short re-formatting of the major points:

Background:

linear mode of solver (when called after generating the norm equ with invert () and solve()) has no LM factor. Non-linear (when calling solveloop()) will start with a given LM factor, and auto update it from iteration to iteration.
the solver, in neither the linear or non-linear mode does not assume anything on the shape of the chi2 near the minimum (my drawing ability did draw a parabloid, but that is the easiest curve with a minimum)
the mag of LM parameter determines how far from the nominal linear solution you check; too simple to say that it determines an actual step (depending on linearity of and coherence between parameters
a non-zerLM factor will make the normeq positive definite, and hence will always be solable. This means that no SVD is necessary. The collinearity factor is indeed there to help in cases of machine precision; but its purpose is reversed. It tries to guard against nonsens solutions, because due to macjhine rounding real collinearity is disguised as solvable. Those cases, in the linear case, are much better dealt with assuming that they cannot be solved, rather than solving nonsens. In the LM case, wjhere there always will be a solution, machine rounding will cause a wrong step, but will always come back
SVD will stop a fail (and andd a constraint which uses the least information (in essence adding equations perpendicular to the known solution space). Switching SVD on will take zero time during calculations, until indeed the rank is incorrect, when a diofferent solution path is taken (in essence because non-symmetric normeq is generated, which cannot be solved by the fast internal standard solver)
Adding extra constraining factors (be they condeq or constraints) will eliminate any rank problems. However, in the case of LM there is no rank problem, and putting in constraining equations will disturb the solution, and slow the convergence
yes, having a non-zero collinearity will cause pseudo-ranking errors in the LM case when getting close to the solution (small LM factors) and a solution that is inhyerently under determined without any additional information. Note that in the LM non-linear looping cases the additional constraining information tartvalues of the parameters!

Strategy: 1 Nonlinear: set collinearity to zero; leave on SVD if you want (does not cost anything and could safe you in pathological cases); do not supply extraneous constraining information

2 After convergence create the normeq a last time; and call the standard 'invert()' for linear solutions. At this stage SVD or constraining equations (sufficient for the ranking deficiency only!) should be used. Now the normeq can be inverted as well, and covariance and parameter errors can be obtained and used. Note that calculating the covariance (and errors derived from it) is expensive: the solver normally does not calculate the inverse of teh normeq, but solve in a different way.

Remarks:

keep SVD sitcehd on ok; but keep collinearity (for non-linear loops!) to zero. For linear solutions non-zero collinearity (but smaller than I use by default maybe -- will come back omn that one) may make almost underdetermined problems really underdetermined, and probably produce a physically more reliable overall estimate.

-0 note that constraining information can be given as condition equations (in which case the error calculations are incorrect) or as constraints, and also note that a constraint equation is identical in format to a condition equation, only its use in the actual solver is different (read about lambda factors). So in practice the trees can still offer condeq; but they will be fed into LSQFit as constraints (see also NonLineaerLSQLM.cc for example and the fitting/test programs.

simple runs of solutions should be able to convince

at 2012-02-14 12:25:06 Oleg Smirnov replied:

More comments from meeting of 14/02/12:

in a rank-deficiency situation, getConstraint() can be used to retrieve the additional constraint vectors added by the SVD. To put it simplistically, these vectors will be significantly non-zero for those parameters requiring extra constraints, and 0 for those parameters which are "well determined".
the covariance matrix is only valid after a step with lm=0, so we need to take the last iteration step like that. The covariance matrix then gives us error estimates on the parameters.

at 2012-02-14 12:30:23 Oleg Smirnov replied:

Additional comment from Sarod -- adding a regularization factor to the Jones matrix solution prior to inversion: $(J+\sigma I)^{-1}$, where \sigma is the noise covariance: \sum n n^H.

at 2012-02-14 12:36:03 Oleg Smirnov replied:

The above should be \sigma^2 I.

ratt-ru / meqtrees