ratt-ru / meqtrees

A library for implementing radio astronomical Measurement Equations
http://meqtrees.net
10 stars 2 forks source link

Implement improved LM solver strategy suggested by WNB #376

Open gijzelaerr opened 10 years ago

gijzelaerr commented 10 years ago
at 2006-01-31 11:11:09 Jan Noordam reported:

Implement improved LM solver strategy suggested by WNB

gijzelaerr commented 10 years ago

Original comment thread migrated from bugzilla

at 2006-01-31 11:11:09 Jan Noordam replied:

WNB has outlined his ideas about solving strategy in a series of recent emails (which should be copied here, of course). This is just my summary, and a few comments, so that we have at least some record of these things.

A reminder of some background:

-) The solver is linear if the LM parameter is zero, and non-linear otherwise.

-) When non-linear, the solver assumes a parabolic shape for the minimum of the qhisq surface.

-) The magnitude of the LM parameter determines the step-size in solution- space. Initially, it is set to a 'large' value, and it is decreased (or increased) as the solvers feels that it is getting closer to the minimum.

-) A non-zero LM parameter assures that the matrix is positive-definite, so it will always invert. In practice, finite machine precision will cause errors. This is avoided by having a non-zero colinearity factor, which will cause a FAIL when the vectors get more parallel than a certain small angle. This happens if we try to solve with too little information.

-) Using SVD will avoid this condition, because, if necessary, it generates some extra condition equations, that supply the missing information. This may be detected by looking at the matrix rank, which is smaller than full. NB: The extra processing cost of using SVD is negligible.

-) The alternative to SVD is to supply the extra condition equations explicitly (using extra condeqs), e.g. equations that require the sum of the unkown phases to be zero. According to WNB, supplying a priori information in this way MAY cause a non-optimal path to the solution...

-) Note that all this explains why the rank gradually decreases as we get closer to the solution: As the maginitude of the LM parameter decreases, the non-zero colinearity factor will cause SVD to kick in with more and more extra equations. (It is now possible to do tests with different values of the colinearity factor, and with or without SVD.)

The recommended strategy is the following:

1) Set the colinearity factor to zero, and switch off SVD. Since the LM parameter is non-zero, it will converge non-linearly. Do NOT supply and extra condition equations.

2) After convergence, set the LM parameter to zero, and do a last linear solution, with either SVD or explicit condition equations (the latter are by definition linear in the unknowns!). NB: Only at this point is it meaningful to ask the WNB fitter for covariance values...

Some remarks:

-) I have little feeling of what machine inprecesion will do to the solution in the first phase. It seems to me that it is safer (at little cos) to keep SVD switched on, and have a small colinearity factor (i.e. what we are doing now).

-) In order to include extra condition equations in the last step, the solver needs to know which of its condeq children contain them. This is certainly possible, because the tree builder can set a switch when the condeq is created.

-) If we decide to go this route, the intelligence has to be built into the MeqSolver node. In this stage, I am not yet convinced of the superiority of this strategy over our current practice, but that could change...

-) See also the Bugzilla item on estimating the chisq surface

There are two issues here: The accuracy of the solution (especially in the presence of noise), and efficiency (the number of iterations needed). As to the latter, I am much more concerned about using the last solution as starting position for the next, which is impossible in tiled solutions....

at 2006-02-03 08:33:01 Wim Brouw replied:

Try again:

The description lacks in some details/precision. Most of it is explained in the LSQ paper (in Newstar and/or aips++ meos); the fitting.h and the LSQFit.h. In addition a look at NonLinearLSQ{,LM}.h and the implementation in .cc will help.

A short re-formatting of the major points:

Background:

Strategy: 1 Nonlinear: set collinearity to zero; leave on SVD if you want (does not cost anything and could safe you in pathological cases); do not supply extraneous constraining information

2 After convergence create the normeq a last time; and call the standard 'invert()' for linear solutions. At this stage SVD or constraining equations (sufficient for the ranking deficiency only!) should be used. Now the normeq can be inverted as well, and covariance and parameter errors can be obtained and used. Note that calculating the covariance (and errors derived from it) is expensive: the solver normally does not calculate the inverse of teh normeq, but solve in a different way.

Remarks:

-0 note that constraining information can be given as condition equations (in which case the error calculations are incorrect) or as constraints, and also note that a constraint equation is identical in format to a condition equation, only its use in the actual solver is different (read about lambda factors). So in practice the trees can still offer condeq; but they will be fed into LSQFit as constraints (see also NonLineaerLSQLM.cc for example and the fitting/test programs.

at 2012-02-14 12:25:06 Oleg Smirnov replied:

More comments from meeting of 14/02/12:

at 2012-02-14 12:30:23 Oleg Smirnov replied:

Additional comment from Sarod -- adding a regularization factor to the Jones matrix solution prior to inversion: $(J+\sigma I)^{-1}$, where \sigma is the noise covariance: \sum n n^H.

at 2012-02-14 12:36:03 Oleg Smirnov replied:

The above should be \sigma^2 I.