Open linarietuma opened 2 years ago
Good idea.
Could possible be limited by the particle initialisation somehow (like backprop was by weight initialisation), haven't delved into, but of interest: https://doi.org/10.1109/3ICT.2018.8855743
I've currently edited the code to run until a higher accuracy is reached (testing with 0.7), will run for a while and see if perhaps it ever breaks past the 0.55 threshold we're hitting.
Yeah seems to be perpetually stuck there.
I've just tried plotting the accuracy with each step and it's literally a flat line, even from the first very iteration. So it seems like the search is not actually improving anything (which is interesting since the best known values do change)
Woke up with a theory: perhaps due to the way particles are currently just limited to the boundaries if they ever hit one (even in only 1 dimension) they just get stuck there because their inertia is in that direction.
Going to fix that behaviour today and see if it changes this at all.
Edit: This seems to suggest my theory could be correct: https://escholarship.org/uc/item/14r1b39q (bottom of page 4574)
Edit 2: Also interesting: https://www.aifb.kit.edu/images/f/f7/FL.pdf (I was wondering if we should modify the velocity rather than the position so that the particle can never actually reach a boundary, this seems to suggest both approaches can be used)
Okay I have tried:
Both still exhibited this 0.555 accuracy, starting to think there is a logic error in the code again, but still can't see anything wrong yet
Okay I've noticed an interesting behaviour:
All of the prediction values from the resulting ANN are 1. Which means the accuracy we are seeing is just the accuracy when all predictions are 1 and the PSO doesn't seem to be diverging from that behaviour.
More interesting observation:
I've added a method of tracking the particle positions throughout the search and found two unexpected behaviours:
My last comment was incorrect, was due to a bug in the way I was tracking positions (due to reference type values were being appended to every index for every particle). Position tracking now works, here's what the first weight in the list does for each particle:
It looks reasonable.
I was curious to see that the values doesn't seem to converge towards the end, increased the iterations to 500 and they still don't. Started playing around with the hyperparameters and saw a few different behaviours:
Tested the hypothesis of search space potentially being too complex so tried simpler ANN architectures:
Squashed down the range of the activation functions to match that of weights, observed no change in accuracy or the particles' behaviour, after 200 iterations there are no signs of convergence.
Tested the PSO on a simpler dataset. I used the first two classes of Python's Iris dataset singling out two features. A key observation - all instances are classified as the dominant class in the dataset (e.g. the class with more instances), same as the original banknote dataset. Not quite sure how to use this information just yet but I think it's essential in figuring out the weird behaviour of the PSO.
Another observation - final personal fitness values are essentially identical (aside from some differences after 5 decimal places for a couple of values), so I'm starting to wondering if thinking about convergence in terms of the weight values might be faulty logic and it's the loss value we should be focusing on. Looking at the informant fitness values, they appear to converge fairly quickly, I'm assuming different weight combinations could theoretically achieve the same average loss.
A key observation - all instances are classified as the dominant class in the dataset (e.g. the class with more instances), same as the original banknote dataset.
This is a good point and I think highlights what is going on here - it definitely sounds like a local optima. Consider that all immediately surrounding points in the search space classify 1 less instance as the dominant class and it is wrong, then the particle is encouraged to return to the local optima.
I'm starting to wondering if thinking about convergence in terms of the weight values might be faulty logic and it's the loss value we should be focusing on. Looking at the informant fitness values, they appear to converge fairly quickly, I'm assuming different weight combinations could theoretically achieve the same average loss.
I believe you're correct, which would suggest that there are multiple local optima in the search space the particles can find themselves drawn in to. So perhaps we need to adjust hyper parameters to encourage more exploration (higher inertia weighting?).
If the search space has multiple local optima and only one global optima then this is a fairly complex problem for PSO (though it could be the case that there are multiple global optima following the same logic about different weight combinations producing the same output).
May be unrelated, but if we want to expand the search space and avoid overflow errors there are a few options:
Seriously, there was such a simple solution all along 😑 Sadly, it doesn't seem to fix the issue of PSO getting stuck in a local optima, it seems that a larger search space just prolongs the inevitable from happening and eventually the swarm gets stuck in that same optima.
Figured it might be a good idea to keep track of the different hyper-parameter combinations/ strategies we've tried while troubleshooting the issue of the PSO getting stuck in a local optima.
Baseline parameters I'm using:
What I've tried so far:
1. Increasing the number of iterations Iterations = 5000 No change to accuracy, still = 0.55556
2. Increasing the number of iterations and swarm size Iterations = 1000, swarm = 70 No change to accuracy, still = 0.55556
3. Initialising
coords
as Gaussian random variables No change to accuracy, still = 0.555564. Using the full dataset Accuracy drops to 0.488775
5. Full dataset + increased number of iterations Iterations = 2000 Slight increase in accuracy =0.4936
6. Full dataset + each particle has all other particle as informants Accuracy drops to = 0.48465
7. Full dataset + Increase search space
min_value = -10
,max_value = 10
Accuracy at 0.555539, encounter overflow issues. I've checked weight values for a converged network trained with backprop, all appear within -1 to 1 range, so search space shouldn't be the issue here.8. Using
pyswarms
to check if PSO code is the issuepyswarms
is a library for implementing a PSO optimiser which I applied to the network (using the full dataset) with the same baseline parameters and the best accuracy is... 0.555539 😭 I ran the optimiser over 7000 iterations, messed around with other params with no luck.From the results I got, I think it's pretty safe to conclude that the PSO code is running as it should, possible areas to focus on moving forward :
from_list
method?)