libprima / prima

PRIMA is a package for solving general nonlinear optimization problems without using derivatives. It provides the reference implementation for Powell's derivative-free optimization methods, i.e., COBYLA, UOBYQA, NEWUOA, BOBYQA, and LINCOA. PRIMA means Reference Implementation for Powell's methods with Modernization and Amelioration, P for Powell.
http://libprima.net
BSD 3-Clause "New" or "Revised" License
312 stars 43 forks source link

input variables scaling #95

Closed jschueller closed 1 year ago

jschueller commented 1 year ago
emmt commented 1 year ago

Scaling can be done automatically for fully bounded variables. Using a non-linear transform (like arcsine or tanh) may not be a good idea for the convergence. The main reason to scale variables is to have similar magnitude for the changes of all variables of the problem and, for example, avoid that the convergence depends on the units of the variables. This is ususal trick when applying a numerical method to a problem with heterogeneous variables.

zaikunzhang commented 1 year ago

Hi @jschueller and @emmt ,

Apologies for the late reply.

First of all, I totally agree that scaling is important. Thank you for raising this point.

However, I do not think a trivial and good scaling strategy exists.

  1. Similar to what @emmt pointed out, it is generally wrong to scale using a nonlinear transformation. I am not saying that it is always wrong --- a certain transformation may work extremely well on a particular class of problems (see the example at the end of this comment), but a nonlinear transformation that does not fit the problem will likely make the problem become more difficult.

  2. It is hard to find a scaling strategy that fits all problems. For example, consider the following problems.

    1.) No constraints. The starting point is $0$. 2.) No constraints. The starting point is $10^{-3}$. 3.) No constraints. The starting point is $10^{-16}$.
    4.) No constraints. The starting point is $10^{10}$. 5.) The bound constraint is $-10^{10} \le x \le 10^{10}$. The starting point is $10^{10}$. 6.) The bound constraint is $10^{10}-1 \le x \le 10^{10}+1$. The starting point is $10^{10}$. 7.) The bound constraint is $-\infty < x \le 10^{10}+1$. The starting point is $10^{10}$.

  3. The best scaling strategy is only known to the user, who should scale the problem before passing it to the solver. If he/she does not do so, I suggest considering defining the scaling factor with the following strategy or something similar. The factor is defined entrywise. Here, $t$ is a small value, e.g., $10^{-3}$.

    1.) The bound constraint is $l \le x \le u$: $(u-l)/2$ provided that this value is at least eps (otherwise, we consider $x$ should be fixed at (l+u)/2 and remove this constraint. 2.) The bound constraint is $-\inf < x \le u$: $\max(|x_0|, |u|)/2$ provided that this value is at least $t$. 3.) The bound constraint is $l \le x < \inf$: $\max(|x_0|, |l|)/2$ provided that this value is at least $t$. 4.) No bound constraint: $|x_0|/2$ provided that this value is at least $t$.

  4. There should be a flag to switch on or off the scaling. I suggest switching it off by default.

Note that the above strategy does not work well in the following case, where the problem probably needs a nonlinear transformation (taking logarithm) before any scaling (recall point 1 above).

Thanks.

emmt commented 1 year ago

3. The best scaling strategy is only known to the user, who should scale the problem before passing it to the solver. 4. There should be a flag to switch on or off the scaling. I suggest switching it off by default.

I totally agree with these 2 points, this is why I would add the scaling as an optional argument (e.g. a keyword in Julia interface) whose values (the scaling factors for each variables of the problem) are chosen by the caller.

If the optional scaling is specified as a single value, we could assume that it is the parameter t in the above discussion and compute the scaling factors following what you propose and according to the values of the initial variables and the bounds (x0, l, and u).

zaikunzhang commented 1 year ago

If the optional scaling is specified as a single value, we could assume that it is the parameter t in the above discussion and compute the scaling factors following what you propose and according to the values of the initial variables and the bounds (x0, l, and u).

The scaling scheme involving t is only a very rough idea. I am not sure whether it is the best scaling we can have without information from the user. However, I do not think it is a good idea to let users input t in any way. It is a parameter that is too technical and should not be exposed to the user.

The MATLAB interface does the following:

  1. Should the problem be scaled? This is indicated by an option called scale, which is defaulted to false.
  2. If scale is true, then scale the problem in the best way that we can figure out according to the data (e.g., using x0, lb, ub as elaborated above).

We do not accept a scaling that is input by the user --- if the user does know the scaling, then he/she should have scaled the problem before sending it to the solvers. It is not something complicated/error-prone to do and it is unnecessary for the solvers to make things become complicated (e.g., iprint as you noted) by handling such things.

In other words, we only handle two cases:

  1. the user does not want to scale the problem;
  2. the user asks us to scale the problem and decide how to scale it ourselves.

Thanks.

emmt commented 1 year ago

We do not accept a scaling that is input by the user --- if the user does know the scaling, then he/she should have scaled the problem before sending it to the solvers. It is not something complicated/error-prone to do and it is unnecessary for the solvers to make things become complicated (e.g., iprint as you noted) by handling such things.

OK but then this means that, internaly, the Powell's algorithms are capable of dealing with non-unit scaling of variables (in particular for the iprint stuff), so why not let the user specify scaling factors if she/he has good reasons to do so and do not want to cope with the complicated task of correctly dealing with these factors in the objective function. An example that comes to my mind is model fitting with heterogeneous parameters: most users would like to express their objective function as simply as possible considereng their units of choice for the variables but also have an idea of the typical magnitude of the variables at the solution. This information can easily be turned into scaling factors. This was my proposition.

zaikunzhang commented 1 year ago

OK. I agree.

What about the following?

Define two options, scale and scaling_factor, the first one being logical/boolean, and the second being a positive vector of length n (dimension of the problem).

  1. If the user inputs scale = true or does not input it, and specifies scaling_factor, then scale the problem as

    x_before_scaling = scaling_factor * x_after_scaling + x0

    where * is entry-wise.

  2. If the user inputs scale = true but does not specify scaling_factor, then we define the scaling factor in a certain way (e.g., the way outlined above) and do the scaling as in 2.

  3. If the user inputs scale = false but also inputs scaling_factor, then raise a warning and do not do the scaling.

If the user inputs scaling_factor, we need to verify its validity unless scale is set to false. A scaling factor is valid if each component is positive and finite. If any component is invalid, then we raise a warning and define the scaling factor for this component in our own way. All the valid components of the user-defined scaling factor will be used as is.

The problem with iprint is not solvable. The best we can do is to mention in the documentation that the information printed by the Fortran backend corresponds to the scaled problem if the scaling does happen.

emmt commented 1 year ago

Since, in Julia, we have access to the type of the arguments/keywords, we can combine your suggested logic in a single keyword, say scale, so that when scale = true or, better, scale = :auto (where :auto is the symbolic name auto in Julia), then some automatic scaling strategy is applied (along your point 2), otherwise scale may be set with a vector of scaling factors, or left to its default value false or :none (again a Julia symbolic name). Using symbolic names rather than true or false is more informative about what will be done regarding the scaling of the variables and left open the possibility to implement different strategies identified by their names. Besides, having the scaling of the variables controled by a single keyword is easier to understand for the users.

If the user inputs scaling_factor, we need to verify its validity unless scale is set to false. A scaling factor is valid if each component is positive and finite. If any component is invalid, then we raise a warning and define the scaling factor for this component in our own way. All the valid components of the user-defined scaling factor will be used as is.

I agree that scaling factors must be checked for validity but I would rather throw an exception (i.e. raise an error rather than a warning) if they are invalid because it means that the caller has made a wrong choice and must fix that.

zaikunzhang commented 1 year ago

Since, in Julia, we have access to the type of the arguments/keywords, we can combine your suggested logic in a single keyword, say scale, so that when scale = true or, better, scale = :auto (where :auto is the symbolic name auto in Julia), then some automatic scaling strategy is applied (along your point 2), otherwise scale may be set with a vector of scaling factors, or left to its default value false or :none (again a Julia symbolic name). Using symbolic names rather than true or false is more informative about what will be done regarding the scaling of the variables and left open the possibility to implement different strategies identified by their names. Besides, having the scaling of the variables controled by a single keyword is easier to understand for the users.

This is an excellent idea!

I agree that scale = :auto conveys the idea better than scale = true.

If the user inputs scaling_factor, we need to verify its validity unless scale is set to false. A scaling factor is valid if each component is positive and finite. If any component is invalid, then we raise a warning and define the scaling factor for this component in our own way. All the valid components of the user-defined scaling factor will be used as is.

I agree that scaling factors must be checked for validity but I would rather throw an exception (i.e. raise an error rather than a warning) if they are invalid because it means that the caller has made a wrong choice and must fix that.

I tend to disagree with this point. The reason is as follows.

When developing a solver (be it optimization, linear algebra, PDE, ...), we should keep an important fact in mind: the "caller" is not necessarily a human. It is very likely another piece of code that calls the solver. The invocation of the solver is probably a quite small part of a bigger project. Moreover, the invocation of the solver may be done repeatedly in a loop.

For example, consider the following pseudo-code.

do k = 1, 2, ..., until x_k achieves some kind of convergence
x_k = X(x_1, ..., x_{k-1}, other data)
obj_k = O(x_1, ..., x_k, other data)
scale_k = S(x_1, ..., x_k, other data)
solver(obj_k, x_k, scale_k)
end do

Suppose that, in theory, it is guaranteed that scale_k is always valid. In computation, scale_k may become invalid (some components become zero or small negative values) due to ill-conditioning and rounding errors, even though the "some kind of convergence" is still far from being achieved. In this situation, should we raise an error and terminate the computation, or should we raise a warning, correct the scale_k, and continue with the loop?

In my opinion, a user-friendly package should tolerate any erroneous options as long as the problem being solved is still well-defined. If an option is invalid, then raise a warning and reset it to a correct one. (Of course, exceptions exist.)

Indeed, this happens quite often for linear solvers. Consider a problem where the coefficient matrix is theoretically nonsingular. In practice, the matrix may become numerically singular due to ill-conditioning. If this happens, should we abort the computation and raise an error, or should we raise a warning and solve the problem in the best way we can? I would prefer the latter. (In this case, the problem being solved is even not well-defined in the classical sense).

What do you think? Thanks.

emmt commented 1 year ago

Well, for me, such an error indicates an erroneous use of the program which I would not left pass through. My experience is it's usually a bad idea to sweep this kind of mistake under the carpet because it delays the moment when one realizes the error and correctly fixes it.

In the case of your example, a possible explicit fix of scale_k is to replace the call to solver by:

solver(obj_k, x_k, max(scale_k, eps))

assuming max operates elementwise and where eps is any suitable positive threshold. Many other strategies are possible, for instance fixing the problem only if the issue is raised (you can generally catch errors in high-level languages).

We don't have to make a decision right now, especially not until we've defined a rescue strategy. I would propose to cope with this in several steps:

  1. Implement the option of manually scaling the variables. It is the caller's responsibility to provide good scaling factors, an error is raised if these factors are not definite or not strictly positive.
  2. Implement and test good automatic scaling strategies. As you pointed, this may be a long process.
  3. Decide on more elaborated ways to deal with erroneous scaling factors.

At least implementing step 1 is a prerequisite.

zaikunzhang commented 1 year ago

such an error indicates an erroneous use of the program which I would not left pass through.

I agree that some errors likely indicate an erroneous use of the program, which should not be left pass through. This is what I mean by the exceptions when saying the following.

In my opinion, a user-friendly package should tolerate any erroneous options as long as the problem being solved is still well-defined. If an option is invalid, then raise a warning and reset it to a correct one. (Of course, exceptions exist.)

However, I am not sure whether an invalid scale falls into this category or not. To me, if scale is defined by some code using some data, an invalid scale sounds similar to a numerically singular matrix in the case of linear systems.

My experience is it's usually a bad idea to sweep this kind of mistake under the carpet because it delays the moment when one realizes the error and correctly fixes it.

I agree with this idea in general. That is why users should always be informed when some inputs are erroneous, eitherby a warning (in most cases) or by an error (in exceptional cases).

We don't have to make a decision right now, especially not until we've defined a rescue strategy. I would propose to cope with this in several steps:

I agree. Anyway, the particular question of "how to deal with an invalid scale" is not a quite big issue (but the question of "how to deal with invalid inputs in general" is important).

Thanks.

jschueller commented 1 year ago

it turns out i'm not so much interested in that anymore