scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
12.92k stars 5.15k forks source link

optimize.differential_evolution: possibility to pass parameters to the polish function #13561

Open labay11 opened 3 years ago

labay11 commented 3 years ago

Other global minimization functions already support this option like shgo with minimizer_kwargs or dual_annealing with local_search_options for instance. However, differential_evolution only support toggling on and off this minimization with the polish parameter. It would be great to have the possibility to choose the method as well as any other parameter that the method supports.

Moreover, I propose that the same name for this parameter is used for all other algorithms of this type being the name minimizer_kwargs the most clarifying in my opinion.

andyfaff commented 3 years ago

@labay11 , thank you for contributing this issue and relevant PR. However, I think we should discuss this in a bit more detail before we go any further with the PR. With shgo and dual_annealing the local search minimizer (specified by minimizer_kwargs) is called multiple times during the iteration of, and is integral to, the parent minimisation. With differential_evolution the polishing minimisation is only done once, after all iteration of the main solver has finished (to possibly get a slightly better answer). Therefore, it's much less important to specify a specific method for the polishing minimiser with differential_evolution.

I'd be interested to hear about specific use cases for where a specific method for the polishing is super useful, and where it makes sense to do it as part of differential_evolution rather than an additional one-liner (as above).

labay11 commented 3 years ago

@andyfaff hello and thanks for the comment. Just to clarify, my main intention behind this issue is to have the possibility to pass other parameters to the minimizer, other than the method itself. As you say, the number of methods that support bounded and constrained problems is not that large.

The reason behind this issue is based on two problems that I am experiencing now with completely different fitness functions. In the first one, the computational time of the fitness function is huge (worst case can be minutes) but the search space is small and in the second, the computational time is negligible but a small change in a single parameter has a huge effect in the final energy. In the former I have to reduce the number of calls to the function as much as possible and it would be great to have the possibility of specifying the derivative function or the number of iterations. The latter problem is more subtle, preferably I would like to run the deterministic method after each generation but I already saw there was an issue that could cope with this (#6923) together with your one-line. For that, I would require an extra parameter like minimize_every_iter in shgo and a bigger change in the algorithm (I will try to update this with some references on the usage of evolution+deterministic).

In short, your suggestion can be applied in the first case without problems, still I feel it is a bit strange to offer the possibility of polishing but not being able to adjust the parameters at all so maybe the discussion is in fact whether polishing has to be included or not. If it has, then allowing full customization I think it is always good. What's more, if we do polishing, we could further allow the user to control when this is done to cope with my second problem. In any case, this is not part of the issue but I am trying to go in that direction.

andyfaff commented 3 years ago

computational time of the fitness function is huge (worst case can be minutes) but the search space is small

differential_evolution uses popsize * len(x) function evaluations per iteration (popsize defaulting to 20). The total number of function evaluations over the entire solve process will dwarf the number of function evaluations that a polishing minimiser would use, even without a jacobian. Thus, the payoff with your first case is minimal.

The second case is interesting. In a lot of the problems that I use for differential_evolution for I frequently see a big drops in cost interspersed with a more gradual decrease. This happens as the evolution process suddenly finds areas in the hypercube of much lower density. One concern I would have with local minimization after every iteration is that it could make it harder for the evolution process to find those areas of significantly lower energy, because the best solution would've already made it to the bottom of a (possibly local) minimum. If there is research in this area I'd be interested to know.

Another possible issue with local minimisation after every iteration is that a minimum may be reached after a reduced number of iterations compared to no local minimisation. Remember that the overall stopping criterion for differential evolution is when the variation in the population members decreases below a specified level, which typically requires a lot more iterations.

Polishing at the end is done to eke out any further improvements, which will usually be very small, and you just need to get x a bit closer to 0 (how close one gets with the de solve may depend on the value of tol). If one is that close the choice of minimizer would have little effect on that process. I made an example. In the gist you see that polishing does improve things for that example, but that the main solve got most of the way there, and that there's not much difference between the polishing functions (in terms of nfev). 'trust-constr' does burn a lot of iterations.

Personally I'm not convinced by the need for the full customisation of the polishing function. I am definitely interested to know if more research has been done on the local minimisation after every iteration.

andyfaff commented 3 years ago

Also, L-BFGS-B was chosen because it's the default minimisation method for a bounded problem.

labay11 commented 3 years ago

In a lot of the problems that I use for differential_evolution for I frequently see a big drops in cost interspersed with a more gradual decrease.

I also tend to see this behavior even when a deterministic method is alternated with the evolution and in particular cases it might speed up convergence.

Here is an extract from the book by Sean Luke, 2013, Essentials of Metaheuristics, Lulu (chapter 3.3.4):

... “Lamarckian Algorithms”. Jean-Baptiste Lamarck was a French biologist around the time of the American revolution who proposed an early but mistaken notion of evolution. His idea was that after individuals improved themselves during their lifetimes, they then passed those traits genetically to their off-spring. For example, horse-like animals in Africa might strain to reach fruit in trees, stretching their necks. These slightly longer necks were then passed to their offspring. After several generations of stretching, behold the giraffe. Similarly, these kinds of hybrid algorithms often work by individuals improving themselves during fitness assessment and then passing on their improvements to their children.

This algorithms that combine both techniques are also called Memetic algorithm. The virtue of improving during the evolution is that particular individuals that are closer to the global minimum might be less optimal than others because they are not sufficiently refined and consequently might be removed from the population. But this of course, depends on the particular problem you need to solve and the precision you need for your values. How many people can benefit from the inclusion this method? That is something I can't answer.

Another possible issue with local minimisation after every iteration is that a minimum may be reached after a reduced number of iterations compared to no local minimisation. Remember that the overall stopping criterion for differential evolution is when the variation in the population members decreases below a specified level, which typically requires a lot more iterations.

That is certainly true and in both of the problems I am skipping this (tol=0) and just let it run for a fixed number of iterations. I totally understand it's usefulness but in the two problems mentioned above it makes the algorithm exit quite easily.


In summary, my points in favor of this PR are:

tupui commented 3 years ago

After reading both sides, I have three remarks:

  1. There is indeed a discrepancy between the way we can customize the polishing/minimizer and this should be fixed. IMO, it is up to the user to say if an unbounded optimizer would fit better in their case. We can still provide with a default we think is working ok.
  2. If polishing with L-BFGS-B add a small value, is a bit arbitrary as a choice, and we don't want to add options, then why not completely remove it? This would be clearer and we could highlight in the doc that you can go with another optimization loop if wanted. Because in the end, we don't really take advantage to be in the function and have the function evaluations around to help the last minimization (no precomputed gradient, etc.).
  3. Adding an intermediate local minimization is another story and IMO should be considered as a separate thing.

So I would either go for completely removing the polishing, or having a coherent interface with the other functions as suggested by @labay11.

tupui commented 3 years ago

@Stefan-Endres this is also something to consider for your improvement I suppose.

mdhaber commented 1 year ago

There are a lot of good points here, but since this has been inactive, I thought I'd suggest something that may be a reasonable compromise and doesn't require much decision-making: instead of deprecating polish or adding a new parameter minimize_kwargs parameter, why don't we allow the user to pass an optimizer callable (that satisfies the minimize interface) into the existing polish keyword?

Of course, this is so much more convenient to the user than for the user to just pass the output of differential_evolution into the local optimizer themselves. But then again, neither is the minimizer_kwargs of the existing PR, and hopefully, this suggestion is less contentious since we can reuse the existing polish keyword, it's 100% customizable, and it's backward compatible, so we don't have to do the work of a deprecation.