Constraints and Conditions in the Solver

Original comment thread migrated from bugzilla

at 2005-12-21 11:55:37 Jan Noordam replied:

There has been some confusion in the recent past about the precise meaning of the term 'constraint equation'. In order to allow us to discuss this important matter more fruitfully, I propose the following:

-) A constraint equation forces the solution to an exact value for one or more of the unknowns. This is consistent with official mathematical nomenclature. There is a way in Wim's fitting routines to impose such a constraint properly, but MeqTrees do not use this (yet).

-) A condition equation is an equation supplied by a condeq (as the name already suggests). It influences the solution in a least squares sense. I compare it with an elastic string that holds things together while allowing some deformation.

I will start using these terms consistently. Please do the same.

The usual example is station phase. Since an interferometer measures phase differences, there is not enough information to solve for absolute phase. SVD deals with this by supplying extra (constraint!) information, which in this case is equivalent to setting the sum of the station phases to zero (according to Wim). This is indicated in our MeqSolver result plot by the black line that represents the rank of the solution matrix, which curves to the left from the right edge (full rank) as the solution converges. (NB: This is a powerful little indicator, which shows behaviour that we do not always understand!!)

Although the SVD intervention leads to the correct solution, there is a suspicion (perhaps unfounded) that convergence might be quicker (efficiency!) if the absolute phase information were supplied explicitly. There are various possibilities to do this:

-) One may leave one of the stations out of the phase solution. Thus, it will keep its current value, which may be anything...

-) One may add an extra condeq that equates one of the station phase MeqParms to a MeqConstant. Since the solution is free-floating, this elastic will cause the selected phase to be exactly equal to the MeqConstant value (e.g. 0.0). Actually, it is equal to this value with an accuracy of ~1e-10....?

-) The same as above, but the sum of all station phases is set to zero.

The total number of condition equations accumulated by the solver is the number of condeqs times the number of domain cells. At this moment, all these equations have the same weight. Obviously, equations with a higher weight have a greater influence on the solution, i.e. their elastic is less yielding. This can be used to play all kinds of intersting games.

Therefore, in the near future, we should allow a condeq to have a third child, which represents a weight (which may vary over the domain cells, of course). If omitted, cell weights are ussumed to be unity. If a weight is less or equal then zero, no equation is generated by that cell. This provides a mechanism to implement 'weak' redundancy calibration (by making the weight inversely proportional to the uvw-distance (f,t) between two uv-data samples), or to ignore 'bad' data dynamically (by making the weight smaller than zero)

at 2006-01-02 14:45:15 Oleg Smirnov replied:

Posting this on behalf of Wim Brouw:

There has been some confusion in the recent past about the precise meaning of the term 'constraint equation'. In order to allow us to discuss this important matter more fruitfully, I propose the following:

-) A constraint equation forces the solution to an exact value for one or more of the unknowns. This is consistent with official mathematical nomenclature. There is a way in Wim's fitting routines to impose such a constraint properly, but MeqTrees do not use this (yet).

-) A condition equation is an equation supplied by a condeq (as the name already suggests). It influences the solution in a least squares sense. I compare it with an elastic string that holds things together while allowing some deformation.

This statement has a few hidden definitions:

'in a lsq sense': assumes that the condeq follows the assumptions about lsq; a.o. that 'observations' (=known sides) and their weights have a Gaussian distribution in principle
the 'elasticity' is given by the known 'model' that is solved, and the data points provided
not every equation that is put into an lsq solution as a 'condeq' is necessarily a condition equation for the model that is being pursued.

I will start using these terms consistently. Please do the same.

Seems like a good idea

The usual example is station phase. Since an interferometer measures phase differences, there is not enough information to solve for absolute phase. SVD deals with this by supplying extra (constraint!) information, which in this case is equivalent to setting the sum of the station phases to zero (according to Wim).

In this case (where only differences are measured of unknowns in a linear equation) a simple condeq giving another relation with the unknown(s) will work. The sum of phases equal to zero is the one that assumes the least information (i.e. it is a vector perpendicular to all possible solution vectors); but indeed setting a single phase to zero will work as well (this is also done in Miriad; and I think aips), and is called the 'reference telescope'. It is worse than taking the average:

if all observations with that telescope are flagged you break continuity, and have to 'select' another reference telescope
if the reference telescope has a phase jump of say 100 deg; this translates to -100 deg in all telescopes (in the sum case and lofar only a 1deg phase juump in all other telescopes)

Giving a condeq with sum of all phases constant will work in this case as well, and should be preferred if the tie is made with a condeq. 'working' means that the 'correct' solution is obtained; but not necessarily the right errors per telescope are determined (in all of this is assumed that the 'condeq' are properly weighted, i.e. with the square of the amplitude of each datapoint to indicate the high error for low amplitudes. Other weights could be added if e.g. it is known that the sensitivity on a baseline (i.e. the amplitude error) is different, or maybe even skewed(?))

Example: A simple 3 telescope system with condeqs: p1-p2=15 p1-p3=80 p2-p3=65

results in normal equations and known vector: 2 -1 -1 = 95 -1 2 -1 50 -1 -1 2 -145

cannot be solved. Adding p2=0 gives: 2 -1 -1 -1 3 -1 -1 -1 2] inverted a covariance matrix: 1.66667 1 1.33333 1 1 1 1.33333 1 1.66667] with result: [15 0 -65]

Changing the 'extra' to 2p2=0, will give same result, but a covariance of: 0.916667 0.25 0.583333 0.25 0.25 0.25 0.583333 0.25 0.916667]

the result is hence a change in the formal error for the solved phases (chi2 doesn not change of course). As e.g. a constraining condeq of p1+p2+p3=0 is used: 3 -1 -1 -1 3 -1 -1 -1 3] --> 0.5 0.25 0.25 0.25 0.5 0.25 0.25 0.25 0.5] in this case the error is properly distributed (at least in the case where no weighting is applied to any of the 'real' condeq).

On the other hand, if adding the 'constraining' equation not as a condeq, but in the form of a 'constraint, i.e. as an extra equation in the normal equations, and an additional unknown Lagrange multiplier, i.e. equations:

2  -1 -1
-1 2  -1
-1 -1 2
0  1  0]

the covariance matrix is:

0.333333     -0.333333   0
-1.11022e-16 2.22045e-16 -2.22045e-16
0            -0.333333   0.333333    ]

which gives the expected formal error of 0 for p2 (and, of course, the same solution [15 0 -65])

Changing the constraint equation to e.g. 2p2=0 does not change covariance:

2  -1 -1
-1 2  -1
-1 -1 2
0  2  0]

-> 0.333333 -0.333333 0 5.55112e-17 -1.11022e-16 0 5.55112e-17 -0.333333 0.333333 ]

while using as constraint p1+p2+p3=0:

2  -1 -1
-1 2  -1
-1 -1 2
1  1  1]

-> 0.222222 -0.111111 -0.111111 -0.111111 0.222222 -0.111111 -0.111111 -0.111111 0.222222 ] gives the expected equal formal error for all phases.

The above is all for linear solutions, see below for non-linear

This is indicated in our MeqSolver result plot by the black line that represents the rank of the solution matrix, which curves to the left from the right edge (full rank) as the solution converges. (NB: This is a powerful little indicator, which shows behaviour that we do not always understand!!)

If a non-linear solution is used for the phase correction (which is inherently a linear equation if indeed the log of the complex data is taken), the above can be explained if also in the solution the 'colinearity factor' is not made equal to zero; and/or the solution iteration goes until the 'LM factor' gets of the order of (the square-root of) the colinearity factor.. I.e.

The LM process produces at each phase an, in principle, positive definite set of normal equations; due to the fact that at each stage the diagonal terms are enhanced by a positive value. As long as the 'colinearity factor' is equal to zero, that means that the set of equations can always be solved (if the colinearity factor, which is principle meant to cater for numerical errors, is too large, the system can be quasi-dependent, which causes the SVD constraint to be added). Note that the solution is for a 'delta' solution! Forcing a constraining phase condeq will in this case force a non-natural solution.

Let us do it. The same as above, and let us assume initial phases of all 0, and an LM factor of 1. We solve then: [23.75 12.5 -36.25] which changes known to: [-11.25 -60 -48.75] better ch2; make LM factor 0.1, and solve; then .01,.001 and we get to: [31.9,16.8 -48.7] which has sum 0; and chi2 of 8e-6 (and full rank)

Although the SVD intervention leads to the correct solution, there is a suspicion (perhaps unfounded) that convergence might be quicker (efficiency!) if the absolute phase information were supplied explicitly.

The SVD is not necessary (it will only com into being if we are 'almost there' and the co-linearity factor is non-zero. Absolute phase information cannot be given, apart from start values of the phases before starting the non-linear solution

There are various possibilities to do this:

-) One may leave one of the stations out of the phase solution. Thus, it will keep its current value, which may be anything...

This is equal to the 'reference telescope' solution, and can give to unrealistisjumps in all telescopes

-) One may add an extra condeq that equates one of the station phase MeqParms to a MeqConstant. Since the solution is free-floating, this elastic will cause the selected phase to be exactly equal to the MeqConstant value (e.g. 0.0).

Although (in linear case! in non-linear case with linear equations can give slow convergence) giving correct solution, will give incorrect covariance matrix

Actually, it is equal to this value with an accuracy of ~1e-10....?

there are always numeric limits, esepcially in any iterative process.

-) The same as above, but the sum of all station phases is set to zero.

Same as setting one phase to zero. To set this, make the initial phase guess sum equal to zero (e.g. by setting them all to zero if linear solution as above; or forcing sum to zero if starting from say previous solution by subtracting (sum(phase)/n from each phase.

The total number of condition equations accumulated by the solver is the number of condeqs times the number of domain cells. At this moment, all these equations have the same weight. Obviously, equations with a higher weight have a greater influence on the solution, i.e. their elastic is less yielding. This can be used to play all kinds of intersting games.

and very dangerous games if no proper account is taken of the actual model and its 'weight'

Therefore, in the near future, we should allow a condeq to have a third child, which represents a weight (which may vary over the domain cells, of

course). If

omitted, cell weights are ussumed to be unity. If a weight is less or equal then zero, no equation is generated by that cell.

Yes, I suppose that in the 'phase case' the weight will be in principle the square of the amplitude (or if averaging properly weighted sum; depending on if you are phase averaging or complex averaging.

This provides a mechanism to implement 'weak' redundancy calibration (by making the weight inversely proportional to the uvw-distance (f,t) between two uv-data samples),

I do not really understand how to do this. In the 'general' case; the actual model is calculated, which takes any W or other baseline error into account, and there is no a priori reason to give an observation at some point a different weight then one at another point (unless you want to cater for dependency for data points closer together than beamwidth, in which case you would like to take covariance between points into account: not catered for in most solutions as much to expensive).

In the redund case, I can imagine that your phase equations are change. If eq baseline 12 and 78 have redundant baselines, you will add 1 unknown, and have:

p1-p2+R12 = obs12 p7-p8+R12 = obs78

(in the case of 'equal' redundancy, this can be contracted to: p1-p2-p7+p8 = obs12-obs78 to have one less unknown to solve for.

In the weak redundancy case, you will have something like: p1-p2+R12 = obs12 p7-p8+R78 = obs78

there is no reason to weight any of these equations differently. If you subtract the two equations, and generate a new 'unknown' ' R12-R78', which is assumed to be small ('weak redundancy'), again there is no point to give this obs a different weight than any other equation involving p1,p2, p7 and/or p8: the errors on the p's is not different; nor is the data different.

The only correct way, I think, is to limit the difference between the two redundant baseline phases, and add an 'inequality constraint' like R12-R78<0.1rad, and use say the method we have discussed earlier (if that works for a rellatively small number of unknowns) or so.

But maybe I miss something in the way equations could be set up.

or to ignore 'bad' data dynamically (by making the weight smaller than zero)

obvious, but faster to just skip the data point; which will also give better indication of formal solution error.(btw, the higher level lsq (the [Non]LinearLSQ) does a skip rather than a zero weight).

I have not put this in bugzilla, since I am not ure how to actually do it.

Wim

ratt-ru / meqtrees

Constraints and Conditions in the Solver #315

at 2005-12-21 11:55:37 Jan Noordam reported:

at 2005-12-21 11:55:37 Jan Noordam replied:

at 2006-01-02 14:45:15 Oleg Smirnov replied: