Scaling of variables - Githubissues

emmt commented 1 year ago

Context

All Powell's algorithms were orinally written to solve of problem of the form:

min f(x)    s.t.   x ∈ Ω ⊆ ℝⁿ

where f: Ω → ℝ is the function to minimize, Ω ⊆ ℝⁿ is the set of feasible variables, and n ≥ 1 is the number of variables. The most general feasible set is:

Ω = { x ∈ ℝⁿ | xl ≤ x ≤ xu, Aₑ⋅x = bₑ, Aᵢ⋅x ≤ bᵢ, and c(x) ≤ 0 }

where xl ∈ ℝⁿ and xu ∈ ℝⁿ are lower and upper bounds, Aₑ and bₑ implement linear equality constraints, Aᵢ and bᵢ implement linear inequality constraints, and c: ℝⁿ → ℝᵐ implements m non-linear constraints.

Proposition

We consider here means to scale the variables x of the problem so that (i) the numerical optimization algorithms are more robust in that respect and (ii) the radius of the trust region accounts for the scaling of the variables. Scaling of the variables must be as transparent as possible for the end-users (apart for specifying the scaling). Hence, we want that the end-users deal with the variables x, while the algorithms deal with scaled variables u such that x = S⋅u where S is a diagonal invertible scaling matrix. Typically, each diagonal entry of S is set with the magnitude of the corresponding variable x. For simplicity to implement bound constraints (see below), we assume that all diagonal entries of S are positive. Then:

The initial solution writes x0 = S⋅u0. Thus the algorithms are initialized with u0 = S\x0 (the elementwise division of the parameters by the scaling factors).
For the objective function and the non-linear constraints f(x) = f(S⋅u) and c(x) = c(S⋅u). Hence f and c must be respectively replaced by the compositions f∘s and c∘s where s(u) = S⋅u.
For the linear constraints Aₑ⋅x = bₑ and Aᵢ⋅x ≤ bᵢ which are directly handled by the algorithms, they must be replaced by Aₑ⋅S⋅u = bₑ and Aᵢ⋅S⋅u ≤ bᵢ. Hence `Aₑ and Aᵢ set by the end-user must be replaced by Aₑ⋅S and Aᵢ⋅S in the algorithm.
Since S is diagonal with positive diagonal entries, the bound constraints xl ≤ x = S⋅u ≤ xu become S\xl ≤ u ≤ S\xu. Hence the lower and upper bounds must be replaced by S\xl and S\xu respectively.

The only remaining issue not handled by this proposition is the printing of the variables when iprint says to do that.

jschueller commented 1 year ago

or the scaling could be done in the C layer to benefit python also

emmt commented 1 year ago

If done in the C layer, you'll have to wrap the original functions, the objective f(x) function and the constraints c(x), to transparently handle that. I've seen that there is a data argument that may be used for that. There will remain the iprint issue unless scaling is done in the FORTRAN code and thus benefit to all interfaces. In the mean time, the scaling of variables in any interface can serve as a proof of concept.

zaikunzhang commented 11 months ago

Hi @emmt ,

Thank you very much for proposing this. Apologies for my slow response. Works have been complicated on my side and the situation will continue for a while.

First of all, it is for sure that the Fortran code will never implement the scaling. This is due to the limitation of the language. It is doable but too complicated. Implementing it will complicate the code significantly, contradicting the idea of providing a reference implementation that is readable, understandable, and extendable.

Scaling is important, but it should be done in the high-level interfaces, taking advantage of the capability of the languages. The most important ability would be lambda function/closure/anonymous function; otherwise, the code would have to carry the scaling factors explicitly (e.g., using data in the c case as discussed above), which is a bit redundant.

Another important fact is that deciding good scaling is a nontrivial (but important) problem. The best scaling is only known to ~experienced users~ users that are familiar with the problem being solved. Ideally, the user should scale the problem before passing it to the solver, though I know very well that we are not living in an ideal world. If the user does not do so, we can only guess the scaling (see my comments at https://github.com/libprima/prima/issues/95).

MATLAB guesses the scaling according to an optional input called TypicalX --- we may regard x0 as a typical x unless x0 is a very bad initial guess and quite far away from the solution. NLopt also has some scaling strategies, which are worth learning.

The problem of iprint is, unfortunately, unsolvable without significantly complicating the Fortran code.

BTW, the language is Fortran rather than FORTRAN since the 90s. PRIMA does not contain any FORTRAN code; in contrast, Powell's code is FORTRAN. Nobody should code FORTRAN in any new project.

Thanks.

emmt commented 11 months ago

Hi @zaikunzhang,

BTW, the language is Fortran rather than FORTRAN since the 90s. PRIMA does not contain any FORTRAN code; in contrast, Powell's code is FORTRAN. Nobody should code FORTRAN in any new project.

Thanks for pointing this, I'll try to be careful about the spelling ;-)

Scaling is important, but it should be done in the high-level interfaces, taking advantage of the capability of the languages. The most important ability would be lambda function/closure/anonymous function; otherwise, the code would have to carry the scaling factors explicitly (e.g., using data in the c case as discussed above), which is a bit redundant.

This issue was to figure out all consequences of the scaling of the variables. In particular, scaling cannot be simply done by using closures (or equivalent) for the objective function and non-linear constraints, it must be handled for the linear constraints as well. I agree that all this management should be done by the high level interface not by the user who may, at most, specify the scaling factors (as we discussed elsewhere).

The problem of iprint is, unfortunately, unsolvable without significantly complicating the Fortran code.

I understand that, this is unfortunate but it is a minor issue. The higher level interface could always bypass that and, at least on entry and on return of the algorithm, prints the correct (unscaled) variables.

emmt commented 11 months ago

Scaling of variables has been implemented by commit 07b17f698751d60e8eef392c656ae0d3e4e9cbcb

libprima / PRIMA.jl

Scaling of variables #4

Context

Proposition