Ensure constraints are satisfied

mkovari commented 4 months ago

In the past we found that VMCON was reporting that it had converged but the constraints were not accurately satisfied, although we never understood why. To fix this we added code to ensure that the constraints were satisfied before VMCON exited.

    !  Check convergence of constraint residuals
    summ = 0.0D0
    !do i = 1,m
    ! This only includes the equality constraints Issue #505
    do i = 1,meq
      summ = summ + conf(i)*conf(i)
    end do
    sqsumsq = sqrt(summ)

    etc

    if (verbose == 1) then
      write(*,'(a,es13.5,a,es13.5)') &
        'Constraint residuals (sqsumsq) = ',sqsumsq, &
        ' Convergence parameter = ',sum
    end if

    ! Writting the step results in OPT.DAT file
    do i = 1, n
      delta_var(i) = delta(i)
    end do

    ! Comment in to write optional optimisation information output file
    !  write(opt_file, '(I5,E28.10,*(E18.10))') niter+1, abs(objf), sum, sqsumsq, conf, x, delta_var

    !  Exit if both convergence criteria are satisfied
    !  (the original criterion, plus constraint residuals below the tolerance level)
    !  Temporarily set the two tolerances equal (should perhaps be an input parameter)
    sqsumsq_tol = tol

    ! Store the lowest valid FoM (ie where constraints are satisfied)
    if (sqsumsq < sqsumsq_tol)  lowest_valid_fom = min(lowest_valid_fom, objf)

The line that actually checks for convergence:

    if ((sum <= tol).and.(sqsumsq < sqsumsq_tol)) then
      if (verbose == 1) then
        write(*,*) 'Convergence parameter < convergence criterion (epsvmc)'
        write(*,*) 'Root of sum of squares of residuals < tolerance (sqsumsq_tol)'
      end if
      exit_code = 1
      return
    end if

(original branch)

Proposed solution

It might help to add this additional convergence test to pyvmcon. One could also add a new line to the input code to allow sqsumsq_tol and tol to be input separately. @jonmaddock @timothy-nunn

mkovari commented 4 months ago

Just as an example, in the large tokamak regression test the constraint residuals are five orders of magnitude larger than the convergence parameter:

 Square root of the sum of squares of the constraint residuals     (sqsumsq)                 5.639E-05  OP 
 VMCON convergence parameter                                       (convergence_parameter)   5.589E-10  OP

In this run the error tolerance for VMCON is epsvmc = 1e-7.

The convergence parameter is given in the Crane report: The constraint residuals are $c_i$, the Lagrange multipliers are $\lambda_i$ and the superscript $j$ represents the iteration. Somehow several of the Lagrange multipliers have become very small. (This input file uses f-values, so all the constraints are equalities.)

It would be good to output a table of all the terms in the convergence equation in a separate csv file.

@timothy-nunn @jonmaddock

jonmaddock commented 2 months ago

Separate issues

This is related to my work on ensuring that the inequality constraints are completely satisfied (zero tolerance), whilst the equality constraints are allowed some tolerance (as in the current convergence criterion). The PR for this will be created shortly on the PyVMCON repository, as it is required for reliability analysis work. In my view, this should be merged separately.

The optional output of all terms in a .csv file is a good idea, and could also be handled separately.

However, the main point in this issue as I understand it is the possibility that a solution may be "converged" according to the original convergence criterion whilst having significantly large equality constraint residuals (as Michael reports above, "five orders of magnitude difference"). This is due to the Lagrange multipliers $\vec{\lambda}$ being sufficiently small as to reduce the effect of the constraint residuals in the convergence criterion, as Michael states.

Small Lagrange multipliers

At the solution, the Lagrange multipliers $\lambda^{*}$ are defined as: $$\frac{df^{*}}{dc}(c) = \lambda^{*}(c),$$ where $f^{*}$ is the objective function at the solution. This can be interpreted as quantifying the effect on the objective function at the solution of tightening or loosening a constraint $c$ (for a given value of the constraint). $\lambda \ne 0$ only for "active" constraints; constraints that, when varied, change the objective function at the solution. In other words, the solution is "on" the constraint.

In the convergence criterion, $\lambda c$ gives the expected change in the objective function as a result of the constraint not being satisfied ($c \ne 0$). For $\lambda c \ne 0$, $\lambda \ne 0$ (constraint is active) and $c \ne 0$ (violated by some amount in the case of an equality constraint, satisfied or violated in the case of inequality). For a satisfied inequality constraint, $\lambda = 0$.

So, a small $\lambda_i$ implies that its corresponding constraint $c_i$ is active (practically always the case for equality constraints), but varying it doesn't dramatically affect $f$. At the solution, if the constraint is violated by a small amount (practically always the case for equality constraints), $\lambda_i c_i > 0$, but the contribution is small.

Effect of original convergence criterion

Using $\lambda_i c_i$ in the convergence criterion allows constraints that are less influential on the objective function (lower $\lambda_i$) to be violated more (larger $c_i$). This explains Michael's observed difference in orders of magnitude between a large sqsumsq and a small convergence_parameter; the (large) violated constraints have little effect (small Lagrange multipliers) on the objective function, so the solution is allowed to converge despite the large residuals.

Potential drawback

Whilst I think this is clever, if I'm not mistaken I think there is a drawback to this approach: $\lambda_i c_i$ only considers the effect on the objective function, not on the position of the solution $\vec{x}$. For example, allowing a constraint to be violated in a converged solution because it may not have much effect on the value of $f(\vec{x})$ is fine if you are only concerned about the value of $f$. However, allowing the constraint to be violated might have a large impact on the value of $\vec{x}$, the solution vector. In this case, the Lagrange multipliers might look like $$\frac{d\vec{x}^{*}}{dc}(c) = \lambda^{*}(c),$$ instead, but I fear this is unnecessary.

Michael, forgive me if I've just caught up with you.

Possible solution

Taking on board Michael's original suggestion of a separate sqsumsq_tol (or alternative equality constraint tolerance), I wonder if it might be worth dropping the Lagrange multipliers altogether in the convergence criterion, and simply using sqsumsq < sqsumsq_tol instead. However, the Lagrange multiplier do help guard against the opposite situation where a slightly violated constraint $c_i$ has a large effect on $f$ (large $\lambda_i$); this would "get through" the sqsumsq < sqsumsq_tol test if it were the only requirement in the convergence criterion.

I therefore agree with Michael that an equality constraint tolerance (perhaps for each equality constraint individually, as per other solvers) should be added to the existing convergence criterion to ensure a certain level of equality constraint residuals is respected, irrespective of the Lagrange multipliers. This would also address the problem in the "potential drawback" section too, I believe.

Summary

Add a zero-tolerance inequality constraint check to the convergence criterion
Write out solution variables in .csv
Add a specific equality tolerance to the convergence criterion

mkovari commented 2 months ago

Sounds good. I would advise doing the zero-tolerance inequality check and the equality constraint check separately and testing each time since I have no idea what the zero-tolerance inequality check will do.

ukaea / PROCESS