Add non-linear equality constraints in COBYLA

libprima / PRIMA.jl

a Julia interface to PRIMA, a Reference Implementation for Powell's methods with Modernization and Amelioration

MIT License

21 stars 5 forks source link

Add non-linear equality constraints in COBYLA #14

Closed emmt closed 10 months ago

emmt commented 10 months ago

We need to add non-linear equality constraints in COBYLA. The Fortran code only supports non-linear inequality constraints. Assuming c_ineq(x) and c_eq(x) are two Julia functions implementing the non-linear constraints:

c_ineq(x) ≤ 0    and    c_eq(x) = 0

a possibility is to turn them in non-linear inequality constraints:

c(x) ≤ 0

with something like:

c(x) = [c_ineq(x)..., c_eq(x)..., -c_eq(x)...]

which amounts to impose that:

c_ineq(x) ≤ 0    and    c_eq(x) ≤ 0    and    -c_eq(x) ≤ 0

zaikunzhang commented 10 months ago

Yes, this should be done.

A quick comment is that I suggest ranging the constraints as

c(x) = [-c_eq(x) , c_eq(x), c_ineq(x)]

(BTW, what the ellipsis mean?)

This will be consistent with what has been done in the MATLAB interface: https://github.com/libprima/prima/blob/a044b2cc7fe5af48ffacf1ad5c5d6c481d72f482/matlab/interfaces/cobyla.m#L513-L539

It does not make any difference in theory, but will affect the numerical behavior of the solver.

Thanks.

emmt commented 10 months ago

Ok for re-ordering the non-linear constraints as you suggested.

Implementing non-linear equality constraints as suggested, implies to re-think about the API for non-linear inequality constraints in the Julia interface because it is not very practical to provide a single function that computes the objective function, the non-linear inequality constraints, and the non-linear equality constraints. For the user, it is certainly more practical to provide one function for each of these. For non-linear constraints, the number of constraints must be provided as well. What I have in mind is something like:

cobyla(args...; kwds..., nonlinear_eq=(n_eq, c_eq), nonlinear_ineq=(n_ineq, c_ineq))

where n_eq (resp. n_ineq) and c_eq (resp. c_ineq) are the number of equality (resp. inequality) constraints and the function to compute them while args... and kwds... are other arguments and keywords. If the caller is interested in the values of the constraints, she/he may provide an array of suitable size to store them instead of n_eq and/or n_ineq. Of course this only makes sense if the Fortran code is such that the array of non-linear constraints does contain the values of these constraints for the returned solution (@zaikunzhang can you confirm that?).

(BTW, what the ellipsis mean?)

In Julia, the ellipsis expands the expression on the left. For example, [a..., b...] is a shortcut for something like:

[a[1], a[2], ..., a[length(a)], b[1], b[2], ..., b[length(b)]]

zaikunzhang commented 10 months ago

For the user, it is certainly more practical to provide one function for each of these

Yes, they should be separated. I thought the objective function and the nonlinear (inequality) constraints were already separated, no?

For non-linear constraints, the number of constraints must be provided as well.

Sorry, we have to be more considerate here (see again "What is an interface / wrapper? "). Ideally, we should not ask the user to provide the number of constraints. It is the wrapper's job to figure this number out. We should not ask the user for additional information that is already implied in the user's input. Otherwise, our wrapper is not user-friendly and only half-baked, I am afraid.

In the MATLAB interface, the number of constraints is obtained by evaluating the nonlinear constraints at x0 and then checking the length. Of course, the evaluated constraint value should not be thrown away, because the evaluation is expensive in practice. Instead, it is sent to the Fortran code, which (optionally) accepts the constraint value at x0 and uses it during the initialization. See

https://github.com/libprima/prima/blob/8f7e80cd6fbe6e9d9a547c1f680dca07fd5823f8/matlab/interfaces/cobyla.m#L417-L421 https://github.com/libprima/prima/blob/a044b2cc7fe5af48ffacf1ad5c5d6c481d72f482/fortran/cobyla/cobyla.f90#L61

However, the C interface currently does not accept the constraint value at x0. This has to be changed. See my proposal at https://github.com/libprima/prima/issues/99. I am not sure whether @jschueller currently has time for it or not.

For PRIMA.jl, there are three possibilities.

Directly interface with the Fortran code, not through the C interface. (Is this possible at all?)
Update the C interface so that it can accept the constraint value at x0.
Wait for someone to update the C interface (I do not have time until November, and my knowledge on C is outdated). Before this is done, the constraint value at x0 can only be thrown away and wasted.

Of course this only makes sense if the Fortran code is such that the array of non-linear constraints does contain the values of these constraints for the returned solution (@zaikunzhang can you confirm that?).

The Fortran code returns the value of the nonlinear constraint at the solution in nlconstr. It is also the case for the C binding. Indeed, for any interface / wrapper / implementation of COBYLA, it is a must to return the value of the nonlinear constraints at the solution. See my elaborations at

https://github.com/libprima/prima/pull/28#issuecomment-1666127400

Thanks.

emmt commented 10 months ago

I mostly agree. I am not sure that calling Fortran 2008 is easy or even doable in Julia. I am pretty sure it should be, but I am not familiar with that (especially Fortan 2008 may have a quite different ABI than old FORTRAN which is close to C except that all arguments are passed by address). In the meantime, the most important is to converge on the API from the end-user point of view. For now, we can code something intermediate and recognize the following keyword syntaxes (in the case of non-linear inequalities, the same rules apply for equalities):

nonlinear_ineq = nothing
nonlinear_ineq = c_ineq
nonlinear_ineq = (n_ineq, c_ineq)
nonlinear_ineq = (c_ineq, n_ineq)

if the value is nothing (the default) there are no such constraints, otherwise there are n_ineq non-linear inequalities implemeted by the callable object c_ineq such that c_ineq(x) yields a vector of n_ineq given the variables x of the problem. n_ineq and c_ineq are easily disentangled by their types. If n_ineq is not specified, it is guessed by the high-level interface by calling c_ineq. Until Julia API directly calls Fortran code, the result of this later call is lost. The n_ineq argument can also be set with an array (of the correct size) if the user is interested in getting the constraints on return.

I am about to solve a PR along these lines...

In the long term, we can keep the n_ineq argument as a number as a mean to ensure that the number of constraints is correct.

BTW Directly calling Fortran would solve the issue that the C wrapper transpose the matrices of linear constraints, so with Julia calling the C wrapper these matrices are actually transposed twice, which is unecessary as Julia and Fortran have the same column-major storage order.

zaikunzhang commented 10 months ago

My suggestion is to have only

nonlinear_ineq = nothing
nonlinear_ineq = c_ineq

and tolerate the waste of c_ineq(x0) for the moment. (Otherwise, there will be too many possibilities to handle, making the code unnecessarily complicated. For example, should we check the consistency between c_ineq and n_ineq? What to do if they are inconsistent?)

After doing this, let us wait for the update of the C interface (which will be done) or switch to the Fortran code.

I guess the second would be ideal, especially after knowing that Julia also stores arrays in columns. In addition, the C interface indeed skips many optional inputs and outputs for simplicity, which is understandable (e.g., eta1, eta2, gamma1, and gamma2, which control the update of the trust-region radius, and fhist, chist, and xhist, which record the history of the iterations). Any wrapper depending on the C interface will inherit these limitations (every approach has its limitation, of course).

However, I am not sure how easy it is to interface Julia directly with the Fortran code.

emmt commented 10 months ago

I would also keep:

nonlinear_ineq = (v_ineq, c_ineq)
nonlinear_ineq = (c_ineq, v_ineq)

with v_ineq a vector to store the constraints on return of the algorithm. Another possibility would be to add the non-linear constraints to the tuple returned by cobyla. For example:

x, fx, nf, rc, cstrv, v_eq, v_ineq = cobyla(f, x0; kwds...)

with x the approximate solution found by the algorithm, fx = f(x), nf the number of calls to f, rc a status code, cstrv is the amount of constraint violation, v_eq the vector of non-linear equality constraints, and v_ineq the vector of non-linear inequality constraints. The 2 latter are new.

emmt commented 10 months ago

This is solved by commit b99ce9ca880b859e2fc3a60cc740cdab3ecedceb when PR https://github.com/libprima/PRIMA.jl/pull/17/commits is merged.

zaikunzhang commented 10 months ago

nonlinear_ineq = (v_ineq, c_ineq)
nonlinear_ineq = (c_ineq, v_ineq)

Does this mean to use an input v_ineq to receive the outputted value? I do not know what is the common practice in Julia, but this sounds a bit C/Fortran-style ... In general, I prefer the functional style "output = fun(input)" as long as it is possible. It is more expressive, easier to understand, and less error-prone, IMHO.

x, fx, nf, rc, cstrv, v_eq, v_ineq = cobyla(f, x0; kwds...)

rc and v_eq/ineq sound too obscure to me. I suggest using more expressive names. For example, what about exitflag (this is the standard name in MATLAB) or info or return_code for rc? What is the standard name in the Julia community? For v_eq, I would use nlconstr_eq (consistent with the Fortran backend), but it has to be consistent with other parts of the code. Recall https://github.com/libprima/PRIMA.jl/issues/3. I suggest sticking to the same name for the same object.

Thanks.

zaikunzhang commented 10 months ago

I guess the names in the high-level interfaces had better conform to the conventions in the respective language. Take nf as an example. It is the number of function evaluations in the Fortran backend. However,

in the MATLAB interface, I choose to use funcCount, which is the standard name used in MATLAB Optimization Toolbox;
in the Python interface, it should be nfev, which is the name defined by https://docs.scipy.org/doc/scipy-1.11.1/reference/generated/scipy.optimize.OptimizeResult.html.

Another example is info:

MATLAB: exitflag
Python: status (https://docs.scipy.org/doc/scipy-1.11.1/reference/generated/scipy.optimize.OptimizeResult.html)

Let us follow the conventions in Julia when choosing the names.

Is there any "standard" optimization package in Julia to take as a reference? What about Optim.jl? If it is considered standard, then I would suggest designing the API of PRIMA.jl to be compatible with Optim.jl.

For example, the MATLAB interface of PRIMA is compatible with fmincon from MathWorks. This means that a line like

[x, f, exitflag, output] = fmincon( ..., some inputs, ... )

will still work if you replace fmincon with prima without any other change:

[x, f, exitflag, output] = prima( ..., some inputs, ... )

In addition, the Python interface of PDFO (predecessor of PRIMA) is compatible with scipy.optimize.minimize. This means that the code like

from scipy.optimize import minimize
res = minimize(..., some inputs, ...)

will still work if you replace scipy.optimize.minimize with python.pdfo.pdfo without any other change:

from python.pdf import pdfo
res = pdfo(..., some inputs, ...)

In this way, we do not need to design a new API, and users do not need to learn how to use PRIMA as long as they already know how to use the standard libraries.

If there is no good convention to follow, then being self-consistent and consistent with the Fortran backend may be a good idea.

Thanks.

emmt commented 10 months ago

A few comments:

In Julia names of outputs and positional inputs are only distinguished by their positions, their names are irrelevant (except for clarity and documentation). Only the names of keywords matter.
I admit that the proposed syntax for the keywords related to the non-linear constraints is more complex than necessary.

To address this latter point, I have updated PR https://github.com/libprima/PRIMA.jl/pull/17 as follows:

Simplify specifying non-linear constraints. The related keywords are simply set with a user-defined function implementing the constraints.
All algorithms now take the same arguments (only available keywords may be different) and return the same result: (x,info) with x the variables and info a structured object collecting all other information provided by the algorithm. For now:
- info.fx is the objective function value f(x);
- info.nf is the number of evaluations of the objective function;
- info.status is the termination status of the algorithm;
- info.cstrv is the amount of constraints violation;
- info.nl_eq gives the non-linear equality constraints, it is an empty vector if there are no such constraints;
- info.nl_ineq gives the non-linear inequality constraints, it is an empty vector if there are no such constraints.
General method issuccess(info) yields whether algorithm has converged.

These changes constitute a major step in the direction of a unified driver (as you have done for other interface), say:

x, info = prima(f, x0; kwds...)

which, depending on the constraints (specified by the keywords kwds...), solves the problem via a suitable algorithm among cobyla, newuoa, bobyqa, uobyqa, or lincoa.

zaikunzhang commented 10 months ago

In Julia names of outputs and positional inputs are only distinguished by their positions, their names are irrelevant (except for clarity and documentation). Only the names of keywords matter.

Understood. However, good variable names also increase the readability of the code and ease the understanding of the users. So they have to be well thought out, especially if they are visible to end users.

The names (and API) adopted by standard/popular libraries (Optim.jl, NLopt.jl, Optimization.jl, etc) are good references. It would be ideal if the names and API could be compatible & consistent with one of them (the most standard / popular one). Otherwise, we might at least take them as examples to get some idea.

I like the idea of packing "secondary" outputs (and inputs/options, see https://github.com/libprima/prima/pull/102) in a structure (or something equivalent). However, I would like to note that the Fortran backend uses info as the name of the exit code, which follows the convention of LAPACK, so I was afraid that it might create some confusion if Julia uses the same name for this structure. How do Optim.jl or other popular call it? For your reference, this structure is called OptimizeResult in scipy.optimize and output in the Optimization Toolbox of MATLAB.

Thanks.

zaikunzhang commented 10 months ago

BTW, the Fortran implementation of PRIMA did not use any "structure" (the name is "derived type" in Fortran) for the input, output, or anywhere. This is because I wanted to provide a reference implementation that exposes the algorithms without using any language-specific features. In addition, I was afraid that using derived types as input or output may introduce problems when interfacing with other languages.

In contrast, the wrappers/interfaces/implementations of PRIMA in other languages should use structures (or something equivalent in the respective language) whenever this leads to better code than otherwise.

zaikunzhang commented 10 months ago

These changes constitute a major step in the direction of a unified driver (as you have done for other interface), say:

x, info = prima(f, x0; kwds...) which, depending on the constraints (specified by the keywords kwds...), solves the problem via a suitable algorithm among cobyla, newuoa, bobyqa, uobyqa, or lincoa.

Yes, something like this would be wonderful! Thank you.

emmt commented 10 months ago

Done in commit 476aad1610c3078e76cedc65142cbd00071009a4

emmt commented 10 months ago

The names (and API) adopted by standard/popular libraries (Optim.jl, NLopt.jl, Optimization.jl, etc) are good references. It would be ideal if the names and API could be compatible & consistent with one of them (the most standard / popular one). Otherwise, we might at least take them as examples to get some idea.

There is no general consensus:

Optim.optimize returns a single structured object res whose contents is extracted by methods like minimiser(res) for the solution x, minimum(res) for f(x), etc. which requires exporting several getters methods only for that purpose. I find the res.key syntax more appropriate, especially considering that, in Julia, the res.key syntax can be extended for the specific type of res and for different symbolic properties key (i.e. they do not need to be the fields of the res structure).
NLopt.optimize returns the 3-tuple (fx,x,ret) with ret a symbolic status to check for convergence. The other informations are stored into the solver object passed as argument.
The umbrella Optimize.jl package offers a unified interface to different optimizers and converts their outputs as needed to follow this API.
etc.

So, for me, we should keep the same API for all the solvers in PRIMA (including the driver one) and simply take care of wrapping this API in other more general packages such as Optimize.jl. This is possible without imposing the users to install all packages thanks to package extensions introduced in Julia 1.9.

I like the idea of packing "secondary" outputs (and inputs/options, see libprima/prima#102) in a structure (or something equivalent). However, I would like to note that the Fortran backend uses info as the name of the exit code, which follows the convention of LAPACK, so I was afraid that it might create some confusion if Julia uses the same name for this structure. How do Optim.jl or other popular call it? For your reference, this structure is called OptimizeResult in scipy.optimize and output in the Optimization Toolbox of MATLAB.

Ok so let us keep that for now. For the name, again (i) there is no consensus among different softwares (I have seen res, stats, etc.) and (ii) the name info is used in the documentation and examples as a remainder that Info is the name of the corresponding Julia structure (something usual and easily understandable for a Julia user), the caller may use any other name more suitable to her/him. BTW, I named Status the enumeration corresponding to the different possible values returned by the C code where the Fortran info variable was returned as rc for return code (I guess).

From the point of view of a Julia user, an important point is to extend standard methods such as issuccess(info) to check whether algorithm has converged.

emmt commented 10 months ago

This issue was solved by b99ce9ca880b859e2fc3a60cc740cdab3ecedceb.