Closed emmt closed 11 months ago
or the scaling could be done in the C layer to benefit python also
If done in the C layer, you'll have to wrap the original functions, the objective f(x)
function and the constraints c(x)
, to transparently handle that. I've seen that there is a data
argument that may be used for that. There will remain the iprint
issue unless scaling is done in the FORTRAN code and thus benefit to all interfaces. In the mean time, the scaling of variables in any interface can serve as a proof of concept.
Hi @emmt ,
Thank you very much for proposing this. Apologies for my slow response. Works have been complicated on my side and the situation will continue for a while.
First of all, it is for sure that the Fortran code will never implement the scaling. This is due to the limitation of the language. It is doable but too complicated. Implementing it will complicate the code significantly, contradicting the idea of providing a reference implementation that is readable, understandable, and extendable.
Scaling is important, but it should be done in the high-level interfaces, taking advantage of the capability of the languages. The most important ability would be lambda function/closure/anonymous function; otherwise, the code would have to carry the scaling factors explicitly (e.g., using data
in the c case as discussed above), which is a bit redundant.
Another important fact is that deciding good scaling is a nontrivial (but important) problem. The best scaling is only known to ~experienced users~ users that are familiar with the problem being solved. Ideally, the user should scale the problem before passing it to the solver, though I know very well that we are not living in an ideal world. If the user does not do so, we can only guess the scaling (see my comments at https://github.com/libprima/prima/issues/95).
MATLAB guesses the scaling according to an optional input called TypicalX
--- we may regard x0
as a typical x unless x0
is a very bad initial guess and quite far away from the solution. NLopt also has some scaling strategies, which are worth learning.
The problem of iprint
is, unfortunately, unsolvable without significantly complicating the Fortran code.
BTW, the language is Fortran rather than FORTRAN since the 90s. PRIMA does not contain any FORTRAN code; in contrast, Powell's code is FORTRAN. Nobody should code FORTRAN in any new project.
Thanks.
Hi @zaikunzhang,
BTW, the language is Fortran rather than FORTRAN since the 90s. PRIMA does not contain any FORTRAN code; in contrast, Powell's code is FORTRAN. Nobody should code FORTRAN in any new project.
Thanks for pointing this, I'll try to be careful about the spelling ;-)
Scaling is important, but it should be done in the high-level interfaces, taking advantage of the capability of the languages. The most important ability would be lambda function/closure/anonymous function; otherwise, the code would have to carry the scaling factors explicitly (e.g., using
data
in the c case as discussed above), which is a bit redundant.
This issue was to figure out all consequences of the scaling of the variables. In particular, scaling cannot be simply done by using closures (or equivalent) for the objective function and non-linear constraints, it must be handled for the linear constraints as well. I agree that all this management should be done by the high level interface not by the user who may, at most, specify the scaling factors (as we discussed elsewhere).
The problem of
iprint
is, unfortunately, unsolvable without significantly complicating the Fortran code.
I understand that, this is unfortunate but it is a minor issue. The higher level interface could always bypass that and, at least on entry and on return of the algorithm, prints the correct (unscaled) variables.
Scaling of variables has been implemented by commit 07b17f698751d60e8eef392c656ae0d3e4e9cbcb
Context
All Powell's algorithms were orinally written to solve of problem of the form:
where
f: Ω → ℝ
is the function to minimize,Ω ⊆ ℝⁿ
is the set of feasible variables, andn ≥ 1
is the number of variables. The most general feasible set is:where
xl ∈ ℝⁿ
andxu ∈ ℝⁿ
are lower and upper bounds,Aₑ
andbₑ
implement linear equality constraints,Aᵢ
andbᵢ
implement linear inequality constraints, andc: ℝⁿ → ℝᵐ
implementsm
non-linear constraints.Proposition
We consider here means to scale the variables
x
of the problem so that (i) the numerical optimization algorithms are more robust in that respect and (ii) the radius of the trust region accounts for the scaling of the variables. Scaling of the variables must be as transparent as possible for the end-users (apart for specifying the scaling). Hence, we want that the end-users deal with the variablesx
, while the algorithms deal with scaled variablesu
such thatx = S⋅u
whereS
is a diagonal invertible scaling matrix. Typically, each diagonal entry ofS
is set with the magnitude of the corresponding variablex
. For simplicity to implement bound constraints (see below), we assume that all diagonal entries ofS
are positive. Then:The initial solution writes
x0 = S⋅u0
. Thus the algorithms are initialized withu0 = S\x0
(the elementwise division of the parameters by the scaling factors).For the objective function and the non-linear constraints
f(x) = f(S⋅u)
andc(x) = c(S⋅u)
. Hencef
andc
must be respectively replaced by the compositionsf∘s
andc∘s
wheres(u) = S⋅u
.For the linear constraints
Aₑ⋅x = bₑ
andAᵢ⋅x ≤ bᵢ
which are directly handled by the algorithms, they must be replaced byAₑ⋅S⋅u = bₑ
andAᵢ⋅S⋅u ≤ bᵢ
. Hence`Aₑ
andAᵢ
set by the end-user must be replaced byAₑ⋅S
andAᵢ⋅S
in the algorithm.Since
S
is diagonal with positive diagonal entries, the bound constraintsxl ≤ x = S⋅u ≤ xu
becomeS\xl ≤ u ≤ S\xu
. Hence the lower and upper bounds must be replaced byS\xl
andS\xu
respectively.The only remaining issue not handled by this proposition is the printing of the variables when
iprint
says to do that.