Open mkovari opened 4 months ago
Just as an example, in the large tokamak regression test the constraint residuals are five orders of magnitude larger than the convergence parameter:
Square root of the sum of squares of the constraint residuals (sqsumsq) 5.639E-05 OP
VMCON convergence parameter (convergence_parameter) 5.589E-10 OP
In this run the error tolerance for VMCON is epsvmc = 1e-7
.
The convergence parameter is given in the Crane report:
The constraint residuals are $c_i$, the Lagrange multipliers are $\lambda_i$ and the superscript $j$ represents the iteration. Somehow several of the Lagrange multipliers have become very small. (This input file uses f-values, so all the constraints are equalities.)
It would be good to output a table of all the terms in the convergence equation in a separate csv
file.
@timothy-nunn @jonmaddock
This is related to my work on ensuring that the inequality constraints are completely satisfied (zero tolerance), whilst the equality constraints are allowed some tolerance (as in the current convergence criterion). The PR for this will be created shortly on the PyVMCON
repository, as it is required for reliability analysis work. In my view, this should be merged separately.
The optional output of all terms in a .csv
file is a good idea, and could also be handled separately.
However, the main point in this issue as I understand it is the possibility that a solution may be "converged" according to the original convergence criterion whilst having significantly large equality constraint residuals (as Michael reports above, "five orders of magnitude difference"). This is due to the Lagrange multipliers $\vec{\lambda}$ being sufficiently small as to reduce the effect of the constraint residuals in the convergence criterion, as Michael states.
At the solution, the Lagrange multipliers $\lambda^{*}$ are defined as: $$\frac{df^{*}}{dc}(c) = \lambda^{*}(c),$$ where $f^{*}$ is the objective function at the solution. This can be interpreted as quantifying the effect on the objective function at the solution of tightening or loosening a constraint $c$ (for a given value of the constraint). $\lambda \ne 0$ only for "active" constraints; constraints that, when varied, change the objective function at the solution. In other words, the solution is "on" the constraint.
In the convergence criterion, $\lambda c$ gives the expected change in the objective function as a result of the constraint not being satisfied ($c \ne 0$). For $\lambda c \ne 0$, $\lambda \ne 0$ (constraint is active) and $c \ne 0$ (violated by some amount in the case of an equality constraint, satisfied or violated in the case of inequality). For a satisfied inequality constraint, $\lambda = 0$.
So, a small $\lambda_i$ implies that its corresponding constraint $c_i$ is active (practically always the case for equality constraints), but varying it doesn't dramatically affect $f$. At the solution, if the constraint is violated by a small amount (practically always the case for equality constraints), $\lambda_i c_i > 0$, but the contribution is small.
Using $\lambda_i c_i$ in the convergence criterion allows constraints that are less influential on the objective function (lower $\lambda_i$) to be violated more (larger $c_i$). This explains Michael's observed difference in orders of magnitude between a large sqsumsq
and a small convergence_parameter
; the (large) violated constraints have little effect (small Lagrange multipliers) on the objective function, so the solution is allowed to converge despite the large residuals.
Whilst I think this is clever, if I'm not mistaken I think there is a drawback to this approach: $\lambda_i c_i$ only considers the effect on the objective function, not on the position of the solution $\vec{x}$. For example, allowing a constraint to be violated in a converged solution because it may not have much effect on the value of $f(\vec{x})$ is fine if you are only concerned about the value of $f$. However, allowing the constraint to be violated might have a large impact on the value of $\vec{x}$, the solution vector. In this case, the Lagrange multipliers might look like $$\frac{d\vec{x}^{*}}{dc}(c) = \lambda^{*}(c),$$ instead, but I fear this is unnecessary.
Michael, forgive me if I've just caught up with you.
Taking on board Michael's original suggestion of a separate sqsumsq_tol
(or alternative equality constraint tolerance), I wonder if it might be worth dropping the Lagrange multipliers altogether in the convergence criterion, and simply using sqsumsq < sqsumsq_tol
instead. However, the Lagrange multiplier do help guard against the opposite situation where a slightly violated constraint $c_i$ has a large effect on $f$ (large $\lambda_i$); this would "get through" the sqsumsq < sqsumsq_tol
test if it were the only requirement in the convergence criterion.
I therefore agree with Michael that an equality constraint tolerance (perhaps for each equality constraint individually, as per other solvers) should be added to the existing convergence criterion to ensure a certain level of equality constraint residuals is respected, irrespective of the Lagrange multipliers. This would also address the problem in the "potential drawback" section too, I believe.
.csv
Sounds good. I would advise doing the zero-tolerance inequality check and the equality constraint check separately and testing each time since I have no idea what the zero-tolerance inequality check will do.
In the past we found that VMCON was reporting that it had converged but the constraints were not accurately satisfied, although we never understood why. To fix this we added code to ensure that the constraints were satisfied before VMCON exited.
The line that actually checks for convergence:
(original branch)
Proposed solution
It might help to add this additional convergence test to pyvmcon. One could also add a new line to the input code to allow
sqsumsq_tol
andtol
to be input separately. @jonmaddock @timothy-nunn