Closed Simply-Adi closed 2 years ago
Not performing the backward method is correct. Most Particle Swarm optimization algorithms are “gradient-free” meaning one avoids taking the gradient (the backward method). The trade off is each optimization step requires additional function evaluations, hence the need for a closure function.
In the current implementation, performing back-propogation to calculate the gradient should have no effect on the behavior of any of the currently available Particle Swarm Optimizers. I will open another issue for me to implement tests to ensure this behavior.
Your code sample looks like it should work, but without your testing it, I'm not sure. Generally, I put the optimizer.zero_grad()
within the closure
function, but it's not strictly necessary unless some other part of the closure is relies on the gradient (which doesn't appear to be the case here). Again, running the backward
and zero_grad
methods should have no effect on the Particle Swarm algorithms--concern should only be necessary to avoid any side-effects.
Does your code, as written, function as expected?
Hi, thank you for the prompt reply. The code runs without errors. Just to give context, my loss function is pinball or quantile loss. The loss curve is sometimes concave. I am still trying to wrap my head around the PSO parameters.
Interesting. I've never used quantile loss, but I have used this library on several concave functions in my own research (which actually lead me to start to write this library--gradient descent was not cutting it for me).
I agree that the PSO hyper-parameters can be a bit... opaque at times. I've spent some time studying the literature, and there's only a handful of useful guidelines that I have found. One of the more important ones is that convergence is only guaranteed for inertial weights less than 1. If the inertial weight is too small, however, the particle will move too freely across the input space, and may converge on a point that is sub-optimal (and not necessarily even a critical point!).
The social and the cognitive coefficients are a tad trickier to pin down, as their effects are tightly coupled. I've seen varying (and sometimes contradictory) suggestions in the literature on these. One source claims that social+cognitive>=4
will converge faster than a sum less than 4. In the paper The particle swarm optimization algorithm: convergence analysis and parameter selection, the parameter b
represents the average of social and cognitive coefficients and a
is the inertial weight. The author provides the following condition with a guarantee of convergence: 2a − b + 2 > 0.
.
In my experience, I usually have to do a hyper-parameter grid search for each problem. That being said, this is my "go-to" setup, inspired by a paper which I have unfortunately lost a reference to:
{
'inertial_weight': .724,
'cognitive_coefficient': 2.1,
'social_coefficient': 2.1,
'num_particles': 32
}
I also have good experiences with the AutotuningPSO
, which is implemented in this library. It schedules changes in the parameters according to the current "step number" of the optimization.
Thank you for sharing your valuable experience with PSO. I will further explore the capabilities of this library in the realm of training other AI models. Hoping that others find this discussion useful!
Happy to help. If you have any other questions or find any bugs, please let me know! This repository is very young, and I'm still actively developing it. So far, it's been heavily focused on developing tools that I've needed for my personal research, but this has been helpful to know how to improve the library.
Not performing the backward method is correct. Most Particle Swarm optimization algorithms are “gradient-free” meaning one avoids taking the gradient (the backward method). The trade off is each optimization step requires additional function evaluations, hence the need for a closure function.
In the current implementation, performing backpropogation to calculate the gradient should have no effect on the behavior of any of the currently available Particle Swarm Optimizers.
On Aug 29, 2022, at 07:56, Thangjam Aditya @.***> wrote:
Hello, I tried interfacing torch-pso with pytorch training algorithm as follows:
for epoch in tqdm_notebook(range(nepochs)): # For each epoch. total_loss = 0 for X_ts,y_ts in train_loader: optimizer.zero_grad() def closure():
output = net.forward(X_ts) loss = criterion(output.squeeze(),y_ts) # loss.backward() return loss loss = closure() loss.backward() optimizer.step(closure) total_loss+= loss.item() avg_loss = total_loss/num_batches epoch_losses.append(avg_loss) ax.plot(epoch_losses)
plt.show(); Is the above interfacing correct? I did not see any loss.backward() line in your example code. Is it not required?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.
Hello, I tried interfacing torch-pso with pytorch training algorithm as follows:
Is the above interfacing correct? I did not see any loss.backward() line in your example code. Is it not required?