Doubt in interfacing toch-pso with pytorch training

Simply-Adi commented 2 years ago

Hello, I tried interfacing torch-pso with pytorch training algorithm as follows:

for epoch in tqdm_notebook(range(nepochs)): # For each epoch.
    total_loss = 0
    for X_ts,y_ts in train_loader:
        optimizer.zero_grad()
        def closure():

            output = net.forward(X_ts)
            loss = criterion(output.squeeze(),y_ts)
            # loss.backward()
            return loss

        loss = closure()
        loss.backward()
        optimizer.step(closure) 
        total_loss+= loss.item()
    avg_loss = total_loss/num_batches
    epoch_losses.append(avg_loss)
    ax.plot(epoch_losses)
plt.show();

Is the above interfacing correct? I did not see any loss.backward() line in your example code. Is it not required?

qthequartermasterman commented 2 years ago

Not performing the backward method is correct. Most Particle Swarm optimization algorithms are “gradient-free” meaning one avoids taking the gradient (the backward method). The trade off is each optimization step requires additional function evaluations, hence the need for a closure function.

In the current implementation, performing back-propogation to calculate the gradient should have no effect on the behavior of any of the currently available Particle Swarm Optimizers. I will open another issue for me to implement tests to ensure this behavior.

Your code sample looks like it should work, but without your testing it, I'm not sure. Generally, I put the optimizer.zero_grad() within the closure function, but it's not strictly necessary unless some other part of the closure is relies on the gradient (which doesn't appear to be the case here). Again, running the backward and zero_grad methods should have no effect on the Particle Swarm algorithms--concern should only be necessary to avoid any side-effects.

qthequartermasterman commented 2 years ago

Does your code, as written, function as expected?

Simply-Adi commented 2 years ago

Hi, thank you for the prompt reply. The code runs without errors. Just to give context, my loss function is pinball or quantile loss. The loss curve is sometimes concave. I am still trying to wrap my head around the PSO parameters.

qthequartermasterman commented 2 years ago

Interesting. I've never used quantile loss, but I have used this library on several concave functions in my own research (which actually lead me to start to write this library--gradient descent was not cutting it for me).

I agree that the PSO hyper-parameters can be a bit... opaque at times. I've spent some time studying the literature, and there's only a handful of useful guidelines that I have found. One of the more important ones is that convergence is only guaranteed for inertial weights less than 1. If the inertial weight is too small, however, the particle will move too freely across the input space, and may converge on a point that is sub-optimal (and not necessarily even a critical point!).

The social and the cognitive coefficients are a tad trickier to pin down, as their effects are tightly coupled. I've seen varying (and sometimes contradictory) suggestions in the literature on these. One source claims that social+cognitive>=4 will converge faster than a sum less than 4. In the paper The particle swarm optimization algorithm: convergence analysis and parameter selection, the parameter b represents the average of social and cognitive coefficients and a is the inertial weight. The author provides the following condition with a guarantee of convergence: 2a − b + 2 > 0..

In my experience, I usually have to do a hyper-parameter grid search for each problem. That being said, this is my "go-to" setup, inspired by a paper which I have unfortunately lost a reference to:

{
        'inertial_weight': .724,
        'cognitive_coefficient': 2.1,
        'social_coefficient': 2.1,
        'num_particles': 32
}

I also have good experiences with the AutotuningPSO, which is implemented in this library. It schedules changes in the parameters according to the current "step number" of the optimization.

Simply-Adi commented 2 years ago

Thank you for sharing your valuable experience with PSO. I will further explore the capabilities of this library in the realm of training other AI models. Hoping that others find this discussion useful!

qthequartermasterman commented 2 years ago

Happy to help. If you have any other questions or find any bugs, please let me know! This repository is very young, and I'm still actively developing it. So far, it's been heavily focused on developing tools that I've needed for my personal research, but this has been helpful to know how to improve the library.

qthequartermasterman commented 1 year ago

Not performing the backward method is correct. Most Particle Swarm optimization algorithms are “gradient-free” meaning one avoids taking the gradient (the backward method). The trade off is each optimization step requires additional function evaluations, hence the need for a closure function.

In the current implementation, performing backpropogation to calculate the gradient should have no effect on the behavior of any of the currently available Particle Swarm Optimizers.

On Aug 29, 2022, at 07:56, Thangjam Aditya @.***> wrote:

Hello, I tried interfacing torch-pso with pytorch training algorithm as follows:

for epoch in tqdm_notebook(range(nepochs)): # For each epoch. total_loss = 0 for X_ts,y_ts in train_loader: optimizer.zero_grad() def closure():
        output = net.forward(X_ts)
        loss = criterion(output.squeeze(),y_ts)
        # loss.backward()
        return loss

    loss = closure()
    loss.backward()
    optimizer.step(closure) 
    total_loss+= loss.item()
avg_loss = total_loss/num_batches
epoch_losses.append(avg_loss)
ax.plot(epoch_losses)
plt.show(); Is the above interfacing correct? I did not see any loss.backward() line in your example code. Is it not required?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

qthequartermasterman / torch_pso

Doubt in interfacing toch-pso with pytorch training #17