kymckay / f21bc-coursework

Coursework for biologically inspired computation
0 stars 0 forks source link

PSO stuck in a local optima #16

Open linarietuma opened 2 years ago

linarietuma commented 2 years ago

Figured it might be a good idea to keep track of the different hyper-parameter combinations/ strategies we've tried while troubleshooting the issue of the PSO getting stuck in a local optima.

Baseline parameters I'm using:

What I've tried so far:

1. Increasing the number of iterations Iterations = 5000 No change to accuracy, still = 0.55556

2. Increasing the number of iterations and swarm size Iterations = 1000, swarm = 70 No change to accuracy, still = 0.55556

3. Initialising coords as Gaussian random variables No change to accuracy, still = 0.55556

4. Using the full dataset Accuracy drops to 0.488775

5. Full dataset + increased number of iterations Iterations = 2000 Slight increase in accuracy =0.4936

6. Full dataset + each particle has all other particle as informants Accuracy drops to = 0.48465

7. Full dataset + Increase search space min_value = -10, max_value = 10 Accuracy at 0.555539, encounter overflow issues. I've checked weight values for a converged network trained with backprop, all appear within -1 to 1 range, so search space shouldn't be the issue here.

8. Using pyswarms to check if PSO code is the issue pyswarms is a library for implementing a PSO optimiser which I applied to the network (using the full dataset) with the same baseline parameters and the best accuracy is... 0.555539 😭 I ran the optimiser over 7000 iterations, messed around with other params with no luck.

From the results I got, I think it's pretty safe to conclude that the PSO code is running as it should, possible areas to focus on moving forward :

  1. PSO hyperparmaters
  2. Network code (unlikely the issue since the old backprop experiments run just fine but could potentially be from_list method?)
  3. Try a different ANN architecture
kymckay commented 2 years ago

Good idea.

Could possible be limited by the particle initialisation somehow (like backprop was by weight initialisation), haven't delved into, but of interest: https://doi.org/10.1109/3ICT.2018.8855743

I've currently edited the code to run until a higher accuracy is reached (testing with 0.7), will run for a while and see if perhaps it ever breaks past the 0.55 threshold we're hitting.

kymckay commented 2 years ago

Yeah seems to be perpetually stuck there.

I've just tried plotting the accuracy with each step and it's literally a flat line, even from the first very iteration. So it seems like the search is not actually improving anything (which is interesting since the best known values do change)

kymckay commented 2 years ago

Woke up with a theory: perhaps due to the way particles are currently just limited to the boundaries if they ever hit one (even in only 1 dimension) they just get stuck there because their inertia is in that direction.

Going to fix that behaviour today and see if it changes this at all.

Edit: This seems to suggest my theory could be correct: https://escholarship.org/uc/item/14r1b39q (bottom of page 4574)

Edit 2: Also interesting: https://www.aifb.kit.edu/images/f/f7/FL.pdf (I was wondering if we should modify the velocity rather than the position so that the particle can never actually reach a boundary, this seems to suggest both approaches can be used)

kymckay commented 2 years ago

Okay I have tried:

Both still exhibited this 0.555 accuracy, starting to think there is a logic error in the code again, but still can't see anything wrong yet

kymckay commented 2 years ago

Okay I've noticed an interesting behaviour:

All of the prediction values from the resulting ANN are 1. Which means the accuracy we are seeing is just the accuracy when all predictions are 1 and the PSO doesn't seem to be diverging from that behaviour.

kymckay commented 2 years ago

More interesting observation:

I've added a method of tracking the particle positions throughout the search and found two unexpected behaviours:

kymckay commented 2 years ago

My last comment was incorrect, was due to a bug in the way I was tracking positions (due to reference type values were being appended to every index for every particle). Position tracking now works, here's what the first weight in the list does for each particle:

Screenshot from 2021-11-07 15-52-56

It looks reasonable.

kymckay commented 2 years ago

I was curious to see that the values doesn't seem to converge towards the end, increased the iterations to 500 and they still don't. Started playing around with the hyperparameters and saw a few different behaviours:

linarietuma commented 2 years ago

Tested the hypothesis of search space potentially being too complex so tried simpler ANN architectures:

Squashed down the range of the activation functions to match that of weights, observed no change in accuracy or the particles' behaviour, after 200 iterations there are no signs of convergence.

linarietuma commented 2 years ago

Tested the PSO on a simpler dataset. I used the first two classes of Python's Iris dataset singling out two features. A key observation - all instances are classified as the dominant class in the dataset (e.g. the class with more instances), same as the original banknote dataset. Not quite sure how to use this information just yet but I think it's essential in figuring out the weird behaviour of the PSO.

Another observation - final personal fitness values are essentially identical (aside from some differences after 5 decimal places for a couple of values), so I'm starting to wondering if thinking about convergence in terms of the weight values might be faulty logic and it's the loss value we should be focusing on. Looking at the informant fitness values, they appear to converge fairly quickly, I'm assuming different weight combinations could theoretically achieve the same average loss.

kymckay commented 2 years ago

A key observation - all instances are classified as the dominant class in the dataset (e.g. the class with more instances), same as the original banknote dataset.

This is a good point and I think highlights what is going on here - it definitely sounds like a local optima. Consider that all immediately surrounding points in the search space classify 1 less instance as the dominant class and it is wrong, then the particle is encouraged to return to the local optima.

I'm starting to wondering if thinking about convergence in terms of the weight values might be faulty logic and it's the loss value we should be focusing on. Looking at the informant fitness values, they appear to converge fairly quickly, I'm assuming different weight combinations could theoretically achieve the same average loss.

I believe you're correct, which would suggest that there are multiple local optima in the search space the particles can find themselves drawn in to. So perhaps we need to adjust hyper parameters to encourage more exploration (higher inertia weighting?).

If the search space has multiple local optima and only one global optima then this is a fairly complex problem for PSO (though it could be the case that there are multiple global optima following the same logic about different weight combinations producing the same output).

kymckay commented 2 years ago

May be unrelated, but if we want to expand the search space and avoid overflow errors there are a few options:

linarietuma commented 2 years ago

Seriously, there was such a simple solution all along 😑 Sadly, it doesn't seem to fix the issue of PSO getting stuck in a local optima, it seems that a larger search space just prolongs the inevitable from happening and eventually the swarm gets stuck in that same optima.