ljvmiranda921 / pyswarms

A research toolkit for particle swarm optimization in Python
https://pyswarms.readthedocs.io/en/latest/
MIT License
1.28k stars 332 forks source link

Early stropping (ftol) if cost isn't reduced #397

Closed msat59 closed 4 years ago

msat59 commented 4 years ago

Describe the bug When the ftol is used and the best cost is equal in two consecutive iteration, the search is stopped even if the cost is high.

To Reproduce options = {'c1': 0.5, 'c2': 0.3, 'w':0.9} optimizer = ps.single.GlobalBestPSO(n_particles=100, dimensions=2, options=options, ftol=1e-10) cost, pos = optimizer.optimize(fx.sphere, iters=1000, n_processes=None, verbose=True)

Output is something like this. Position: [0.06410654 0.02934344], Cost: 0.0049706854704889645

As can be seen, x1 and x2 are far from zero, the cost is relatively high, and it's much higher than 1e-10. The reason is that the algorithm couldn't find lower cost so the swarm.best_cost is equal to best_cost_yet_found. So the search is stopped even if the cost remains high.

Environment (please complete the following information):

How to fix In optimizer code, an additional check should be performed to ignore if swarm.best_cost == best_cost_yet_found. After applying the fix, the output will be:

Position: [-8.05105641e-06 -6.63608458e-07], Cost: 6.525988549305629e-11

msat59 commented 4 years ago

It seems that a new argument, ftol_iter, has been added to GlobalBestPSO.

It is also suffer from the swarm.best_cost == best_cost_yet_found

ljvmiranda921 commented 4 years ago

Hi @msat59 thanks for reporting this. The addition of ftol_iter came from the last PR I merged: https://github.com/ljvmiranda921/pyswarms/pull/391 The idea is that we stop the optimization if the objective function value does not improve over ftol_iter iterations.

Perhaps the default value must be updated? What do you think @msat59 ? Any thoughts @nishnash54 ? :)

msat59 commented 4 years ago

This ftol_iter does not fix the issue and I believe that we don't need it. By applying the fix I mentioned, there is no need to use ftol_iter.

For example, if I set ftol=1e-10 (with my fix) and also use ftol_iter=5, the cost may go somewhere around 1e-15. In addition, with my fix and using ftol_iter=1 (the default value), the search stops before going below under 1e-10.

This doesn't make sense to me. To fix this, we may need add additional if statements which are against the performance.

On the other hand, without my fix, the search may stop when cost is 1e-3 because 5 consecutive costs were equal (without any improvement).

To summarize, when the bestcost is below ftol, I think we don't need to have ftol_iter consecutive iteration with bestcost < ftol. In expensive computations, it's extra and wastes time and processing power. As a result, ftol_iter should be always 1, which makes its implementation pointless.

msat59 commented 4 years ago

I should also mention that ftol_inter increases the chance of stopping in a local minima and miss the global one.

ljvmiranda921 commented 4 years ago

Ok this makes sense, would you like to submit a PR so that we can check?

msat59 commented 4 years ago

Ok this makes sense, would you like to submit a PR so that we can check?

Using the version with or without ftol_inter?

An option should be added to disable ftol_inter if we are going to keep it. I can do it BTW.

ljvmiranda921 commented 4 years ago

Good idea on adding a trigger to disable ftol_iter. How about we default that value to None (disabled?). So if ftol_iter has some value, then we use that?

nishnash54 commented 4 years ago

ftol and ftol_iter works as expected (mentioned in use case #373 and implemented as #391 ). Many larger optimization problems make use of a tolerance based approach to decide when to stop the algorithm (early stopping when no improvement). You can consider this case to be when the algorithm does not show any improvements (ftol) over a set number of iterations (ftol_iter). The behavior of ftol in v.1.1.0 is set to the default behavior in the current implementation.

@msat59 The use case you have mentioned is not what this is meant to accomplish. My understanding is that you want to stop the algorithm when you achieve a certain threshold value. I do not think that is implemented in the library (@ljvmiranda921 correct me if I'm wrong). What you are looking for is a target error or threshold metric; once achieved, the algorithm must stop (early stopping when acceptable value achieved)

If ftol is set to the default value (-inf), it does not play any part in the algorithm. This new case mentioned by @msat59 is really useful and should be implemented. I can work on it if required.

msat59 commented 4 years ago

@nishnash54, I added a switch named fmin in my local version. It worked great but the user needed to know what the minimum cost is. It's possible if one tries to solve an equation or find the roots; but the minimum cost is unknown in several cases.

According to the code, the ftol is the difference between best_cost in two consecutive iterations ( swarm.best_cost - best_cost_yet_found). MATLAB calls it TolFun. My previous explanations might not be clear enough (or accurate) but I meant that early stopping with iter=1 makes everything worse.

MATLAB does not use evolutionary or metaheuristic methods to solve the problems so if I am correct, there isn't any occasion that the minimum cost doesn't change in two steps except a local/global minimum is found. In addition, if the current position passes the minima, the algorithm brings it back close to where it was.

In metaheuristic methods, several candidates try to find the minimum cost. As we find the minimum cost over all candidates, it is possible that one is stuck in a position and the others cannot find a better solution. So early stopping leads to an immature solution. As a result, I believe that we should add this comparison, self.swarm.best_cost - best_cost_yet_found != 0 to the code.

Let me test it more, then I implement the fix. I need to test more to see what the best combination of ftol and ftol_iter is.

@ljvmiranda921 @nishnash54 If we agree to implement fmin, it's very easy; but we need to enrich the documentation and examples. Otherwise, we make the methods so complicated.

msat59 commented 4 years ago

MATLAB documentation has explained everything quite well. Everything is based on a tolerance, not the iteration.

Tolerances and Stopping Criteria

msat59 commented 4 years ago

ftol and ftol_iter works as expected (mentioned in use case #373 and implemented as #391 ). Many larger optimization problems make use of a tolerance based approach to decide when to stop the algorithm (early stopping when no improvement). You can consider this case to be when the algorithm does not show any improvements (ftol) over a set number of iterations (ftol_iter).

I agree with you; but it doesn't work for metaheuristic algorithms, I think.

Just test sphere() with 5 particles and ftol_iter = 5.

optimizer = ps.single.GlobalBestPSO(n_particles=5, dimensions=2, options=options,
                        ftol=1e-5,
                        ftol_iter=10
                        )

It doesn't converge. So what's the point of using early stopping?

To make the algorithm work, best_costmust not be equal to best_cost_yet_found and when their difference is less than ftol, algorithm should be stopped so ftol_iter is not used.

I think it's because of the nature of PSO. We don't deal with the current cost of a particle. We deal with the minimum found, so it's possible that a particle moves around the point previously found; but doesn't stay around it for certain iterations. The X is the position of a particle; not the value found by newton's method or so. It passes the minimum, comes back, passes again, and this repeats.

For instance for one particle


Cost    :   100     70     80     85    83    82    74   71    50   30    20   10   2     
Min Cost:   100     70     70     70    70    70    70   70  ....

As you see, the particle will approach to minima, but early stopping stops at a very high cost.

ljvmiranda921 commented 4 years ago

Hi @nishnash54 , thanks for clarifying and the prompt reply. I apologize that I got confused as well, it seems that we're talking about different use-cases that's why we're thinking of different tolerance strategies. My bad.

I added a switch named fmin in my local version. It worked great but the user needed to know what the minimum cost is.

I'm a bit hesitant to add fmin for this reason. Let's park this idea for now.

As for the fix and API usage

Activating both ftol and ftol_iter at the same time will cause early stopping. I don't want to remove ftol_iter, because I maintain that it's an important feature. One resolution is to default this value to None so that ftol acts consistently with metaheuristic algorithms. Then, let's document the behaviour of ftol and ftol_iter so that users won't get confused.

Let me know what you think @msat59 and @nishnash54 ?

hakan-gecili commented 4 years ago

I will comment only on ftol_iter part. ftol_iter causes early stopping and can be stuck at a local optimum and that is what we intended to do by using ftol_iter. This option is for users who are only interested in finding a good feasible solution, but not an exact global solution because they have a big problem. In my case, each iteration takes 3 minutes, and I usually run more than 500 iterations. In my case, a fast good feasible solution is a lot better than a late exact solution.

msat59 commented 4 years ago

I will comment only on ftol_iter part. ftol_iter causes early stopping and can be stuck at a local optimum and that is what we intended to do by using ftol_iter. This option is for users who are only interested in finding a good feasible solution, but not an exact global solution because they have a big problem. In my case, each iteration takes 3 minutes, and I usually run more than 500 iterations. In my case, a fast good feasible solution is a lot better than a late exact solution.

I agree with all of you; but early stopping is not working for particle swarm optimization. You may stop at second iteration. It's not a local minima, it's somewhere unknown. You will have nothing as a feasible solution. :)

Anyway, I don't want to force anyone to accept it. The main problem that I wanted to address is in the optimizer code, with and without ftol_iter.

When the best_cost stays constant at two consecutive iteration, regardless of the value of ftol_iter, self.swarm.best_cost - best_cost_yet_found becomes zero so the ftol threshold, no matter what it is even 1e-20, becomes useless. Unless it is -np.inf (the default value)

### IT IS A COMPLETELY INCORRECT BEHAVIOR.

msat59 commented 4 years ago

I will submit the solution, works with the current form of ftol_iter.

@nishnash54, if you don't mind, I am going to rename delta to a meaningful name. Actually, it's not a delta. It's is delta less than relative measure . I am looking for a name, something like isBelew_ftol.

msat59 commented 4 years ago

Hi @nishnash54 , thanks for clarifying and the prompt reply. I apologize that I got confused as well, it seems that we're talking about different use-cases that's why we're thinking of different tolerance strategies. My bad.

I added a switch named fmin in my local version. It worked great but the user needed to know what the minimum cost is.

I'm a bit hesitant to add fmin for this reason. Let's park this idea for now.

As for the fix and API usage

Activating both ftol and ftol_iter at the same time will cause early stopping. I don't want to remove ftol_iter, because I maintain that it's an important feature. One resolution is to default this value to None so that ftol acts consistently with metaheuristic algorithms. Then, let's document the behaviour of ftol and ftol_iter so that users won't get confused.

Let me know what you think @msat59 and @nishnash54 ?

The default value of ftol_iter is better to be 0; otherwise we should add an if statement to convert it to zero. I will fix it today after more tests.

It seems that the current implementation of ftol_iter works better if then n_particle and ftol_iter are high enough;

Test case: Sphere(), n_paricle=100 , ftol = 1e-3, and ftol_iter=3.

with ftol_iter: Position:[ 0.070823 -0.06813466], Cost:0.009658228663868153 Search was stopped after 7th iteration. So the objective function was calculated 700 times.

Without ftol_iter with n_paricle=10: Position:[-0.03483422 0.03243626], Cost:0.0022655339311109055 Search was stopped after 22nd iteration. So the objective function was calculated 220 times. => 3 times faster, with lower cost

So imagine it is a SVM calculation.

hakan-gecili commented 4 years ago

I agree that there must be a threshold parameter so that the optimizer can start counting non-improving global best values. If the user knows his problem and has a good estimate of what the objective function value should be, or the least acceptable value, then he/she should be able to put this knowledge into the optimizer. And the cost of implementation and computation is extremely low.

msat59 commented 4 years ago

To fix the constant best_cost, I tested an idea with a relatively better results.

The approach is to use a small weight if the best_cost didn't changed. If the difference between self.swarm.best_cost and best_cost_yet_found is less than relative_measure, then add 1 to (iterations with) nochange, if the difference is zero (no improvement in th best_cost), then add 0.5 or less. In this way, we can handle self.swarm.best_cost == best_cost_yet_found better.

As @hakan said and I mentioned earlier, the best practice is to have fmin with ftol_iter; but it makes it more complicated for a user. However, an advanced user may need such a feature.

The only library I have seen so far with early stopping is Keras, which I didn't like its performance; but it has a baseline option to continue the search as long as the cost is over it. Same as fmin that I mentioned.

nishnash54 commented 4 years ago

I understand the use case you are referring to @msat59

According to my understanding of the current implementation of ftol and ftol_iter, the main aim is to stop running if we do not see acceptable improvements over a number of iterations. The use case you have mentioned is also important and should be implemented. Before you make changes to the current case, I'm unclear about a few things :open_mouth:

  1. If ftol is set to the default value (-inf), ftol_iter must not affect how the algorithm runs. The results must be as if ftol and ftol_iter do not exist. Is there a problem in how the current implementation works? (does not satisfy #373 ) Maybe we should check if this part works as intended.
  2. @hakan-gecili has given a great example of how early stopping (non improving) is used. I think it's important use case to keep in mind when making changes to the ftol_iter
  3. Why not implement a target_tolerance where the user must know the minima? Most of the general mathematical functions used for bench-marking PSO algorithms make use of functions that have a minima at the origin. @ljvmiranda921 We will have to update the documentation and examples accordingly.
  4. @msat59 In your test, 7 iterations correspond to 700 evaluations, if everything is kept the same, I think 22 iterations must correspond to 2200 evaluations. Another thing to keep in mind is to maintain a particle swarm and use the same swarm to run the tests. I think currently the only way to get comparable results as there is no option for an internal random state #235 . The initial position of the particles will play a big role when running tests on smaller objective functions.

Let me know you thoughts! I would be happy to help on the testing and development of the new implementation. :smiley:

msat59 commented 4 years ago

I understand the use case you are referring to @msat59

According to my understanding of the current implementation of ftol and ftol_iter, the main aim is to stop running if we do not see acceptable improvements over a number of iterations. The use case you have mentioned is also important and should be implemented. Before you make changes to the current case, I'm unclear about a few things

  1. If ftol is set to the default value (-inf), ftol_iter must not affect how the algorithm runs. The results must be as if ftol and ftol_iter do not exist. Is there a problem in how the current implementation works? (does not satisfy #373 ) Maybe we should check if this part works as intended.
  2. @hakan-gecili has given a great example of how early stopping (non improving) is used. I think it's important use case to keep in mind when making changes to the ftol_iter
  3. Why not implement a target_tolerance where the user must know the minima? Most of the general mathematical functions used for bench-marking PSO algorithms make use of functions that have a minima at the origin. @ljvmiranda921 We will have to update the documentation and examples accordingly.
  4. @msat59 In your test, 7 iterations correspond to 700 evaluations, if everything is kept the same, I think 22 iterations must correspond to 2200 evaluations. Another thing to keep in mind is to maintain a particle swarm and use the same swarm to run the tests. I think currently the only way to get comparable results as there is no option for an internal random state #235 . The initial position of the particles will play a big role when running tests on smaller objective functions.

Let me know you thoughts! I would be happy to help on the testing and development of the new implementation.

  1. Without using ftol, everything is fine. The implementation of the ftol_iter is fine too. The only problem is the previous implementation of ftol, even if ftol_iter is disabled. It's because of the randomness in PSO, having several parallel candidate solutions (particles) that go back and forth. When they pass the minuma, they need time to come back. The only way I found is to exclude this case: self.swarm.best_cost == best_cost_yet_found. Otherwise we will have an immature solution and the algorithm never converges.
  2. It's what I called fmin. For example, if fmin < 0.01 and ftol_iter > 3, stop the search. fmin solve the previous issue; but makes everything a bit complicated and @ljvmiranda921 doesn't like it !
  3. It's also the same fmin. In real world, we don't deal with benchmarks; but the user may know the minuma. For instance, when I use PSO to tune the model hyperparameters, I know the minimum cost for regression using MSE is zero. So I can set the tolerance to 0.01.
  4. I forgot to mentioned that it was 22 iterations with 10 particles. I am agree with you about the random_state. However, we can use the average performance of 10 models for instance.

I can also share my global_best.py before opening a PR, if you want to test.

msat59 commented 4 years ago

A note, ftol_iter works very well with fmin, base_line, f_limit or whatever we name it. :smiley:

nishnash54 commented 4 years ago

Okay. So, I guess fmin (or target_error) parameter is the best possible option. Anyways, I think we have basically discussed most of the pros and cons. @ljvmiranda921 should take a call.

msat59 commented 4 years ago

I uploaded a test case for various combination of ftol (modified), ftol_iter, and fmin.

You need to copy base and single folders into pyswarms package folder (pythonfolder\Lib\site-packages\pyswarms)

Ftol Test case.zip

ljvmiranda921 commented 4 years ago

Let's park fmin for now and fix the current problem first. I'd suggest that we tackle this in the following manner:

  1. @msat59 sends a PR to update ftol and ftol_iter, we review, iterate if possible then merge.
  2. I think after the conversation above I am a bit convinced that fmin is viable, but we need to make an effort to update the documentation and give examples on how to use it properly. I scope this one out as pretty big task, and if we're gonna do this, it must be on a separate PR.

In short, let's do 1 first, then go with 2 next. Thanks @msat59 for checking this one out, and @nishnash54 + @hakan-gecili for sharing your thoughts! I believe we have all the information we need to proceed with 1. I appreciate this discussion!

cc: @whzup for context

msat59 commented 4 years ago

The progress bar implementation had a small bug when we broke the loop; but I fixed it.

I want to add stopping reason to the logger. Please correct the reasons if you think it's not clear:

for max iteration: Maximum iterations reached for ftol: ftol reached / cost gradient below ftol for ftol_iter: ftol_iter reached

The output of the logger will be: Optimization finished | Reason: «ftol_iter reached» | Last iteration: 8) |

The Reason can be changed to Termination condition too.

Let me know your suggesstions @ljvmiranda921 , @nishnash54 . May be the reached at the end of the reason is extra.

ljvmiranda921 commented 4 years ago

^Good idea and I think that's clear! I suggest that you show what the value was as well:

ftol_iter=4 reached

I know it's small and a bit redundant, but can help the user too! @msat59 Nah, I think the "reached" at the reason is fine. :+1:

msat59 commented 4 years ago

@msat59 , I tested the latest implementation of ftol_iter and now I understand better what you mean :) even if I set the ftol_iter=60 where the total iterations set to 100, the algorithm can stop at the first iteration. This is not how it should be.

In other words, if the obj.val. difference in two consecutive iterations is less than ftol the algorithm stops without counting for ftol_iter.

There is also a problem with progress bar, which I am not sure if there is any issue created for that.

I've fixed both the early stop and the progress bar bro

nishnash54 commented 4 years ago

@hakan-gecili Did you run your tests on this PR? or the latest changes on master? or on the stable PyPi pyswarms v1.1.0

This behavior is weird, there is a test that makes sure the required number of iterations (> ftol_iter) are completed before the optimization stops. Maybe we missed some edge case, could you please share your code and results to replicate the issue.

hakan-gecili commented 4 years ago

Sorry @nishnash54, it is my bad. I think I was using an earlier commit. Anyway I spent some time of create a test problem. Here are details, it might be useful for somebody. So I am posting details:

First I created a 0-1 knapsack problem to test pyswarms PSO (code is given below)

Then I cloned the master on a docker image: git clone https://github.com/ljvmiranda921/pyswarms.git

and also I installed stable pyswams v1.1.0 on my local machine

Next I tested the knapsack problem on both docker image (with ftol and ftol_iter) and local machine(without ftol and ftol_iter).

Summary: Early stopping does work as it should :)

The Knapsack test code:

"""The following 0-1 Knapsack Problem (KP) is created by
Hakan Gecili for testing purposes. For KP problems please refer to:
Kellerer H, Pferschy U, Pisinger D. Knapsack problems. Berlin: Springer; 2004.
Short definition of the problem: From a set of items, select a subset of items
without violating the knapsack capacity so that the total revenue is maximized. 
"""
from numpy import array, round, argwhere
from random import randint, seed
import pyswarms as ps

global number_of_items
number_of_items = 10
global item_range
item_range = range(number_of_items)
# The weight capacity of the knapsack 
global capacity
capacity = 50

def get_particle_obj(X, **kwargs):
    """ Calculates the objective function value which is
    total revenue minus penalty of capacity violations"""
    # X is the decision variable. X is vector in the lenght of number of items 
    # $ value of items
    value = kwargs['value']
    # weight of items
    weight = kwargs['weight']
    # Total revenue 
    revenue = sum([value[i]*round(X[i]) for i in item_range])
    # Total weight of selected items
    used_capacity = sum([kwargs['weight'][i]*round(X[i]) for i in item_range])
    # Total capacity violation with 100 as a penalty cofficient
    capacity_violation = 100 * min(0,capacity - used_capacity)
    # the objective function minimizes the negative revenue, which is the same
    # as maximizing the positive revenue
    return -1*(revenue + capacity_violation)

def objective_function(X, **kwargs):
    n_particles_ = X.shape[0]
    dist = [get_particle_obj(X[i], **kwargs) for i in range(n_particles_)]
    return array(dist)

if __name__ == '__main__':
    seed(0)
    # PARAMETERS
    value = [randint(1,number_of_items) for i in item_range]
    weight = [randint(1,number_of_items) for i in item_range]

    # PSO PARAMETERS
    n_particles = 2
    n_processes = 2
    iterations = 1000
    options = {'c1': 2, 'c2': 2, 'w': 0.7}
    dim = number_of_items
    LB = [0] * dim
    UB = [1] * dim
    constraints = (array(LB), array(UB))
    kwargs  = {'value':value,
               'weight': weight,
               'capacity': capacity
                }

    KP_optimizer = ps.single.GlobalBestPSO(n_particles=n_particles,
                                        dimensions=dim,
                                        options=options,
                                        bounds=constraints,
                                        bh_strategy='periodic',
                                        ftol = 0.01,
                                        ftol_iter = 50,
                                        velocity_clamp = (-0.5,0.5),
                                        vh_strategy = 'invert')
    best_cost, best_pos = KP_optimizer.optimize(objective_function,
                                            iters=iterations,
                                            n_processes= n_processes,
                                            **kwargs)
    print("\nThe total knapsack revenue is:\t "+str(-best_cost))
    print("\nIndices of selected items:\t " + str(argwhere(round(best_pos)).flatten()))

# The best answer I have found so far is 58$ and selected items are [0 1 2 3 4 5 6 8 9]

I set the iters=1000 and and ftol_iter = 50. Here is my output: PSO_test

PSO did not stop before the 50th iteration, which is what we want to see 👍

nishnash54 commented 4 years ago

@hakan-gecili Thanks for taking out your time to test this and also for sharing your code and results.

I was wondering if I could use this code as a better and improved test case. I could simply copy the code or you can just dump it as a PR to my fork

hakan-gecili commented 4 years ago

Hi @nishnash54 I was busy with my thesis, I just finished reading the PR #401 . You guys were having so much fun :)

Yes, You can use my code, actually I have should I posted my previous comment in that PR.

hakan-gecili commented 4 years ago

@ljvmiranda921 , I have more ideas to improve this library. let me know if I can be a contributor at this level of support. Thanks

ljvmiranda921 commented 4 years ago

Hi @hakan-gecili , thanks for volunteering! If this is related to this Issue, I think giving your thoughts on #401 will be greatly appreciated. If this is on a different matter, then I recommend opening up an Issue and we can put this in our feature backlog! :bowing_man:

stale[bot] commented 4 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

ljvmiranda921 commented 4 years ago

Still relevant, stalebot!

stale[bot] commented 4 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.